Stabilized proteins

ABSTRACT

The invention described herein comprises methods for stabilizing polypeptides and polypeptide complexes, and the polypeptides and polypeptide complexes stabilized using the methods. To achieve stabilization, a cross-link reaction is controlled such that polypeptides and polypeptide complexes maintain their original functionality. In one embodiment, the invention provides a method for the identification of amino acid residues which, when cross-linked, are least disruptive to the structure and function of the polypeptide or polypeptide complex. In another embodiment, the invention provides a method for mutagenesis of identified residues to further control the cross-link reaction. Polypeptides and polypeptide complexes so stabilized can be utilized under a wide variety of physiological and non-physiological conditions. Further, the cross-link methodology disclosed herein may preclude the need for addition of exogenous structures to engineered proteins and complexes, such as peptide linkers that could be immunogenic and/or significantly decrease efficacy. In another embodiment, the invention provides a method for statistical analysis of databases of structural and/or sequence information available for polypeptides and polypeptide complexes to be stabilized. The statistical analysis identifies suitable residue pairs which are least likely to be disruptive of structure and function when cross-linked. Further, in a polypeptide chain or chains to be cross-linked, potentially undesirable reactive side-chains may be masked and protected, or altered using site-directed mutagenesis, e.g., to introduce a maximally conservative point mutation that will not support the cross-link reaction. The cross-link reaction conditions may also be adjusted to prevent undesired cross-links or other undesired side-effects. At residues identified as desirable positions for cross-linking, reactive side-chains may be introduced by site-directed mutagenesis, and the cross-link reaction is carried out using the conditions identified above.

This application is a continuation-in-part of PCT/US00/28595 filed Oct.16, 2000, which claims priority of U.S. Provisional Application No.60/159,763 filed Oct. 15, 1999, each of which isincorporated-by-reference herein in its entirety.

1. FIELD OF THE INVENTION

The present invention relates to cross-linking methods to stabilizepolypeptides and polypeptide complexes for commercial uses(pharmaceutical, therapeutic, and industrial), and to polypeptides andpolypeptide complexes so cross linked.

2. BACKGROUND OF THE INVENTION 2.1. Structure and Function ofPolypeptides and Polypeptide Complexes

A protein molecule consists of a linear polypeptide chain of amino acidsthat is intricately folded in three dimensions to form, e.g.,interaction surfaces, binding pockets and active sites. A specificthree-dimensional fold is generally required for protein function,wherein the fold itself is specified by the linear sequence of aminoacids (i.e., the primary structure of the protein). It is notable,however, that dissimilar primary structures can have nearly identicalthree-dimensional folds. Evolution has conserved specific folds to agreater extent than specific primary structures. The protein foldingprocess remains an active field of study. It is known, however, thatsecondary structure elements such as alpha helices, beta sheets and betaturns contribute to assembly of the tertiary stricture of a polypeptide.A biological protein entity made up of several polypeptides is said tohave quaternary structure.

Protein folding ultimately results from the interaction of intra- andinter-molecular forces. As such, a folded protein has a finite stabilitythat translates into a finite structural and functional “half-life” in agiven solvent environment. For example, in an aqueous environment,proteins attain stability in part by clustering hydrophobic residues inthe protein core and hydrophilic residues at the protein-solventinterface. Accordingly, the activity half-life for a given protein is inpart a function of solvent properties. Additionally, chemical bonds suchas disulfides occur in nature to fix the co-ordination ofnon-neighboring side chains in close proximity in a folded protein,thereby stabilizing its structure and function.

In many biological systems, proteins associate with each other to formdimers or higher order multimers (i.e. quaternary structures), and onlyas such carry out their specific functions. The formation of suchcomplexes is often an important event in regulating the activity ofproteins. Various mechanisms have been found to regulate protein complexformation, such as ligand binding, or post-translational modification.The functions of protein complexes can range from providing structure tothe intra-cellular matrix, where, for instance, actin forms a structurallattice, to transcription factors.

Proteins consist of discrete functional domains. Domains of similar oranalogous function in different proteins usually show amino acidsequence similarities and are related in evolution. “Domain shuffling”has played a major role in the evolution (as well as in the geneengineering) of proteins with highly diverse functionalities.Interaction domains, for example, can be found in proteins of manydifferent functions; however, sequence similarities reveal theirpresence. Crystallographic studies have shown that related domains areeven more conserved in secondary, tertiary and quaternary structure thanin primary amino acid sequence, such that structural inferences can bemade about a particular domain if structural data is available on one orpreferably multiple related domains (see e.g., Hofmann K., Cell Mol.Life Sci. vol. 55(8-9): pp. 1113-28, 1999; Chou J. J. et al., Cell vol.94(2): pp. 171-80, 1998).

2.2. Biocatalytic Enzymes

There are numerous conceivable commercial applications of stabilizedproteins, protein complexes and protein-protein interactions. As anexample of a class of proteins for which stabilization is desirable,enzymes and other proteins that have been used as biocatalysts inindustrial applications are considered in this section. Valuation of thebiocatalytic enzyme market is also considered.

Industrial biocatalytic processes have use in many industrial sectors,including the chemical, detergent, pharmaceutical, agricultural, food,cosmetics, textile, materials-processing, and paper industries. Withinthese industries, biocatalysts have many applications, ranging fromproduct synthesis (e.g., amino acid manufacturing), use as active agentsin certain products (e.g., biological washing powders), use indiagnostic testing equipment, and use as therapeutic agents. Total salesof industrial biocatalysts in 1999 were roughly $1.4 billion. Thisfigure is expected to grow significantly over the next decade asbiocatalyst applications are enabled by novel technologies such as theinvention described herein.

Market sectors believed to have potential for growth and technologicalinnovation include engineered enzymes (e.g., for providing fasterthroughput, cheaper production, and/or the capability to produce novelproducts), pollution-control systems (e.g., for bioremediation), andnon-aqueous biocatalytic systems (e.g., for oil and fat bioprocessingand drug manufacture) (see Business Intelligence Center, Explorer: “BICExplorer”; Business Opportunities in Technology Commercialization).

Historically, only a handful of fine chemical companies such as DSM,Lonza and Avecia Ltd., have embraced and invested in biocatalyticprocesses. More recently, however, there have been several significantcorporate investments in the field of biocatalysis. One example of suchan investment is Bayer's recent announcement that it will use 6-7% offine chemical sales to develop enzyme-based processes for certainmolecules.

Major customers of fine chemical companies tend to favor suppliers witha broad range of process development. This consideration suggests thatthose with biocatalytic expertise stand to gain a further competitiveedge in the marketplace. Some firms have recognized this and are tryingquickly to close the gap via acquisitions (e.g. Great Lakes'sacquisition of NSC Technologies and Cambrex's purchase of Celgene).Others acknowledge that they will lose out on further businessopportunities if they don't do something to access the basic skillsrequired for biocatalysis (Joe Blanchard, Altus Biologics Inc. 1999).

Major enzyme manufacturers (e.g. Novo, Genencor, Roche, etc.) tend tofocus on large-scale enzyme production for the major industrial markets(such as detergents and textiles) and not on the application of enzymesfor fine chemical development (Joe Blanchard, Altus Biologics Inc.,1999).

The continued growth in interest in the commercial use of biocatalysisand the fragmentation of the biocatalyst industry will allow both largeand small companies to exploit innovative biocatalysts and the productsand processes that utilize them (BIC Explorer: Business Opportunities inTechnology Commercialization, 1999).

Bioremediation applications may, in the future, turn into one of themost economically important applications of biocatalytic enzymes. Forexample, approximately 2.3 trillion gallons of municipal effluent and4.9 billion gallons of industrial waste are passed into U.S. waters eachyear, and approximately 1 million gallons of hydrocarbons enter ourenvironment per day. Hydrocarbon cleansing is a routine requirement forvarious commercial operations (e.g., oil tankers, marine bilges,storage, fuel and truck tanks).

Currently, there are several processes in development that utilizebiocatalysts for decontamination/decomposition of both hydrocarbons andwastewater. Not only are these processes commercially the most promisingsystems due to efficiency and low costs, but they are also the cleanest.

Furthermore, biocatalytic desulfurization is an inexpensive andattractive technology to the crude oil production market, wherelow-sulfur crude oil commands a premium price over high-sulfur crudeoil. There is a growing need for cost-effective sulfur management anddesulfurization worldwide due to an increased level of sulfur in fossilfuels and increasingly stringent regulations requiring lower sulfuremissions. Compliance with these regulations is expected to cost theEuropean refining industry alone more than $50 billion in capital and$10 billion annually in operating expenditures.

All catalyst manufacturing in 1997 represented a $10 billion-plus marketin the U.S., a figure quoted by the American Chemical Society (see also,“Catalyst Industry Stresses Need for Partners as Key to Future Success,”C&E News, Jul. 11, 1994; CatCon '96 presentations by T. Ludermann ofCONDEA Chemie GmbH, Paul Lamb of Englehard Corporation, and J. Ohmer andK. Herbert of Degussa Corporation). According to Maxigen, the totalindustrial enzymes market (a segment of the catalyst manufacturingmarket) is estimated at $1.4 billion today, growing at roughly 10%annually.

2.3. Stabilization Strategies

Several protein stabilization strategies are known in the art and havebeen previously described, as highlighted below.

2.3.1. Stabilization of Biocatalytic Enzymes

Several approaches have been taken to enhance the stability ofbiocatalysts. On the protein level, the most prominent approachesinclude discovery of stable biocatalysts from investigation ofthermophilic organisms, directed evolution, and computational- andprotein engineering, as described below.

Thermophilic organisms, or ‘extremophiles’, are sought in extremeenvironments such as deep-sea vents and Yellowstone geysers. Althoughenzymes of commercial relevance have been identified from them, this‘discovery’ approach is limited by what can be found in nature. Thisapproach has not yielded as many commercially-relevant, thermostablebiocatalysts as was initially hoped for and/or projected.

‘Directed evolution’ techniques are powerful approaches capable ofgenerating stabilized enzymes, often also with altered/improvedfunctional specificities. However, the approach is limited by thefeasibility of the selection procedure.

Algorithms that calculate intra-molecular forces within proteins arebeing used to design and/or evolve enzymes with greater thermostabilityin silico. This approach is still severely hampered by the limitedunderstanding of the intra-molecular forces and the processes involvedin protein folding.

Addition of chemical modifications that can hold proteins in theircorrect conformation is often referred to as protein engineering. Suchprotein engineering approaches include derivatization (e.g. PEGylation,addition of polymeric sucrose and/or dextran, methoxypolyethyleneglycol, etc.) and old methods of protein cross-linking (e.g. productionof cross-linked enzyme crystals or CLEC's). Unfortunately, theseapproaches are often ineffectual or cause dramatic losses in activity.

Strategies for the operational stabilization of biocatalysts that haveproven successful in some respects include (a) catalyst immobilizationand (b) the use of organic solvents in the reaction medium (termedmedium engineering). Thermal stability upon immobilization is the resultof molecular rigidity and the creation of a protected microenvironment.Methods include multi-point covalent attachment and gel-entrapment.Immobilization of biocatalysts is the most used strategy as additionalbenefits are obtained, such as flexibility of reactor design, andfacilitated product recovery without catalyst contamination. However,despite its great technological potential, few large-scale processesutilize immobilized enzymes. Severe restrictions often arise in scale-upbecause of additional costs, activity losses, and issues regardingdiffusion.

The main purpose of medium engineering in biocatalysis was originally toutilize robust commercial hydrolytic enzymes in organic synthesis.However, enhanced thermostability in organic media has proven anadditional and significant bonus. It is hypothesized that partial oralmost total substitution of water is beneficial since water is involvedin enzyme inactivation. Whatever the mechanism, numerous cases haverecently been reported where remarkable enzyme stability has beenobtained in organic media such as polyglycols and glymes. Despite thisadvance, medium engineering is unlikely to solve all biocatalysisstability problems.

Some of the most promising solutions to biocatalysis problems havecombined evolutionary approaches with operational stabilizationtechniques, such as using directed evolution to generate enzymes withhigher reaction rates in organic solvents. Such combined approaches mayprovide significant synergies which maximally improve upon and enablecommercially-relevant biocatalytic processes. In principle, theinvention described herein below can be applied in combination with anyof the above-mentioned known stabilization approaches.

2.3.2. Stabilization of Other Proteins

Molecular biological techniques have made it possible to stabilize someproteins by, e.g., engineering fusion-proteins. Some fusion proteinshave even displayed novel functionalities. To make a fusion-protein, asingle nucleic acid construct is created that directs the expression ofmodular domains derived from at least two proteins as one protein. Dueto fusion, two domains can be held in very close proximity to eachother, thereby making the local concentration of each domain very highwith respect to the other. In this way, a functional complex isstabilized. For example, homo- and heterodimers of the interleukin 8family have been stabilized in this way, maintaining functionalitysimilar to wild type (Leong S. R. et al. Protein Sci.; vol. 6(3): pp:609-17, 1997) Another example of protein complexes stabilized in thisway is the method stabilizing immunoglobulin Fv fragments, consisting ofthe variable domains of immunoglobulin heavy and light chains, lackingthe stabilizing effect of inter-chain disulfide bonds. It is necessaryto stabilize the complex by another means to maintain the affinity ofthe immunoglobulin complex, and expression of both polypeptides as asingle chain is one of the methods used (Pluckthun and P. Pack.Immunotechnology; vol. 3(2): pp. 83-105, 1997).

However, in the design of pharmacological reagents, it is oftendisadvantageous to create fusion proteins that require a linker sequenceto stabilize them. For example, such linkers introduce non-self epitopeswhich are often recognizes by the organism as foreign and elicit immuneresponses. This reduces the efficacy of such therapeutics and/ordiagnostics because the reagents are then cleared by the immune system(see, for example, Raag R. and Whitlow M. FASEB; vol. 9: pp. 73-80,1995). In the case of single chain Fv fragments, the linker, which ismost frequently chosen to be a highly flexible structure, allows thecomplex to disassociate, since the affinity of the two polypeptides toeach other is low. The single chain Fv fragments then aggregate, orclump, and thereby loose their functionality (Webber K. O. et al. Mol.Immunol.; vol. 32(4): pp. 249-258, 1995). More rigid linkers that lendthe complex more stability, and would thereby decrease the level orspeed of aggregation and loss of functionality, are associated withincreased immunogenicity (Raag R. and Whitlow M. FASEB; vol. 9: pp.73-80, 1995).

Cross-linking the domains at close contact sites would circumvent theseproblems, where it is possible to direct the cross-link between twoproteins to such surfaces of the proteins where after the reaction thecross-link is buried. One such means is to stabilize complexes byintroducing a disulfide bond between two polypeptides by introducingpoint mutations to cystine in both polypeptide chains. The mutations areintroduced at positions that allow the formation of such bonds (see, forexample, Reiter Y. et a. Nat Biotech.; vol. 14: pp. 1239-1245, 1996;Pastan et al. U.S. Pat. No. 5,747,654, issued May 5, 1998).

Disulfide bonds are, however, unstable under many physiologicalconditions (Klinman J. P. (ed). Methods in Enzymology; vol. 258, 1995).Physiological conditions vary widely, for instance with respect to redoxpotential (oxidizing vs. reducing) and acidity (high vs. low pH) of thevarious, physiological milieus (intracellular, extracellular,pinocytosis vesicles, gastro-intestinal lumen, etc.). Di-sulfide bondsare found in nature only in extracellular proteins, and they are knownto fall apart in reducing environments, such as the intracellularmilieu. But even in the extracellular milieu, many engineered di-sulfidebonds are unstable.

Several other chemical cross-link methodologies allow the formation ofbonds that are stable under a broad range of physiological andnon-physiological pH and redox conditions. However, in order to maintainthe complex's activity and specificity, it is necessary that thecross-link is specifically directed and controlled such that, first, theoverall structure of the protein is minimally disrupted, and second,that the cross-link is buried in the protein complex so as not to beimmunogenic. But with most cross-link methodologies, the degree to whichit is possible to direct the bond to a specific site is too limited toallow them to be used for most bio-pharmaceutical and/or diagnosticapplications. Examples of such cross-link methodologies includeUV-cross-linking, and treatment of protein with formamide orglutaraldehyde.

2.3.3. Fv Fragments

Immunoglobulin Fv fragments comprise another example of a class ofproteins for which stabilization is desirable. Immunoglobulin Fvfragments are the smallest fragments of immunoglobulin complexes shownto bind antigen. Fv fragments consist of the variable regions ofimmunoglobulin heavy and light chains and have broad applicability inpharmaceutical and industrial settings.

Value of Fv Fragment Market

A recent analysis estimated that 20 to 40 percent of allbio-technological therapeutics and diagnostics currently in developmentare based on immunoglobulin (Pharmaceutical Research and Manufacturersof America. New Medicines in Development, Survey. 1998). Furthermore, asignificant portion, and the majority of current “state of the art”Ig-based therapeutics and diagnostics in development are Fvfragment-based (Price Waterhouse: Survey of Biopharmaceutical Industry,1998). For reviews of the utility of immunoglobulin as a pharmacologicalagent, see Penichet M. L. et al., Hum Antibodies; vol. 8(3): pp. 106-18,1997; Sensel M. G. et al. Chem. Immunol.; vol. 65: pp. 129-58, 1997;Reiter Y. and Pastan 1. TIBTECH; vol. 16(12): pp. 513-520, 1998; ReiterY. et al. Nat Biotech.; vol. 14: pp. 1239-1245, 1996; Pluckthun and P.Pack. Immunotechnology; vol. 3(2): pp. 83-105, 1997; Wright A. andMorrison S. L. Trends Biotechnol.; vol. 15(1): pp. 26-32, 1997; SchwartzM. A. et al. Cancer Chemother. Biol. Response Modif.; vol. 13:pp.156-74, 1992; Houghton A. N. and Scheinberg D. A. Semin Oncol.; vol.13(2): pp. 165-79, 1986; and Cao Y. and Suresh M. R. BioconjugateChemistry; vol. 9(6): pp. 635-644, 1998.

Following the successful introduction of the first Ig-based biotechdrug, ReoPro by Centocor, in 1994, six more Ig-based drugs were approvedin 1997 and 1998 and six more were in phase III clinical trials as ofthe end of 1998. Sales of a single, clinically successful,immunoglobulin-based product can result in annual revenues oil the orderof several hundreds of millions of dollars (Pharmaceutical Research andManufacturers of America. New Medicines in Development, Survey, 1998).Together, these facts give evidence of the commercial and clinical valueof these types of products.

The cost of developing, producing and clinically testing such productsis, however, immense and the risk of failure is often great. Because ofthis, any technology that can either increase the product'seffectiveness, broaden its range of applications or increase its chancesof succeeding in clinical trials will add enormously to the Net PresentValue of a product in development (Boston Consulting Group: TheContribution of Pharmaceutical Companies: What's at stake for America,1993).

Fv Fragment Stabilization Methods

To date, a variety of methodologies have been employed to stabilizeengineered antibodies. First, introduction of additional di-sulfidebonds has been performed through molecular biological manipulation ofthe antibody-expressing construct (Reiter Y. and Pastan 1. TIBTECH; vol.16(12): pp. 513-520, 1998). Second, introduction of a linker has beenemployed that allows both fragments to be expressed as a single chain(single chain Fv fragments) (Pluckthun and P. Pack. Immunotechnology;vol. 3(2): pp. 83-105, 1997; Cao Y. and Suresh M. R. BioconjugateChemistry; vol. 9(6): pp. 635-644, 1998). Finally, fusion of anexogenous di- or oligomerization domain to each of the Fv fragmentchains has been performed (Pluckthun and P. Pack. Immunotechnology; vol.3(2): pp. 83-105, 1997; Cao Y. and Suresh M. R. Bioconjugate Chemistry;vol. 9(6): pp. 635-644, 1998; see also Antibody Engineering Page, IMT,University of Marburg, FRG:http://aximt1.imt.uni-marburg.de/_rek/indexfenster.html).

However, all of these technologies have significant drawbacks. Disulfidebonds are a suitable bond in the context of Fab fragments (see FIG. 1D),and many other extra-cellular proteins, to stabilize protein complexes.Furthermore the introduction of disulfide bonds avoids the need tointroduce foreign peptides, and the resultant stabilized complexes areminimally immunogenic. Nonetheless, the introduction of disulfide bondsin Fv fragments by molecular biological means results in complexes thatare insufficiently stable under many commercially relevant,physiological conditions, such as the intracellular milieu and sometimeseven serum. As such they have limited usefulness in the pharmaceuticalcontext.

With single chain Fv fragments there is a trade-off between thestability of the complex and its immunogenicity in a therapeutic or invivo diagnostic context. Linkers that result in stable conjugates thatare more rigid structures, and elicit immune responses, which in turnresults in decreased utility. Linkers that are not immunogenic aregenerally the more flexible linkers that provide insufficient stability(see above, Raag R. and Whitlow M. FASEB; vol. 9: pp. 73-80, 1995).

Fv fragments stabilized by fusion to multimerization domains aresignificantly immunogenic, and lack the most significant advantage of Fvfragments in the first place: reduced size and resultant increasedtissue penetration.

Other currently available chemical cross-link methods, such as UVcross-linking (see above), are severely limited in the degree to whichit is possible to direct the bond to a specific site. Asbio-pharmaceutical and/or diagnostic applications require themaintenance of the polypeptide's function, specificity in the cross-linkreaction is paramount.

2.4. The Tyrosyl-Tyrosyl Oxidative Cross-Link

Oxidative cross-link reactions between tyrosyl side-chains have beendemonstrated to occur naturally. For example, cytochrome c peroxidasecompound I has been demonstrated to form di-tyrosine bonds during theendogenous reduction of its active site (Spangler B. D. and Erman J. E.Biochim. Biophys. Acta; vol. 872(1-2): pp. 155-7, 1986), anddi-tyrosine-linked dimers of gammaB-crystallin are reportedly associatedwith cataractogenesis of the eye lens. In vitro, di-tyrosineprotein-protein links are readily formed photodynamically in thepresence of sensitizers (Kanwar R. and Balasubramanian D. Exp. Eye Res.;vol. 68(6): pp. 773-84, 1999). Furthermore, protein cross-linkingthrough the formation of di-tyrosine bonds can be catal % sed, forexample, by peroxidase (Gmeiner B. and Seelos C. FEBS Lett; vol. 255(2):pp. 395-7, 1989), or by metallo-ion complexes (Campbell et al.Bioorganic and Medicinal Chemistry, vol. 6: pp. 1301-1037, 1998; BrownK. C. et al. Biochem.; vol. 34(14): pp. 4733-4739, 1995), and bylight-triggered oxidants (Fancy D. A. and Kodadek T. Proc. Natl. Acad.Sci., U.S.A.; vol. 96: pp. 6020-24, 1999).

As described by Campbell et al, in the presence of an appropriatecatalyst and an appropriate oxidizing reagent, an oxidative cross-linkreaction can occur between tyrosyl side-chains of proteins that areproperly spaced. In this reaction, the hydroxyl groups of the tyrosylside-chains react with each other, an H₂O molecule is released, and theside-chains are linked by a covalent bond. This reaction is thought toproceed through a high-valent metallo-oxo complex which abstracts anelectron from an accessible tyrosyl side-chain, followed by covalentcoupling of the resultant tyrosyl radical with another tyrosylside-chain that is in sufficient proximity.

This cross-link methodology was originally developed to cross-linkproteins that interact in cell lysates, as a proxy to the in vivosituation, to enable the study of the functionality of proteins byidentifying other proteins they interact with. The reaction only occurswith tyrosine side-chains that are in very close proximity to eachother. Furthermore, the bond formed between the tyrosyl side-chains isirreversible and stable under a very wide range of physiologicalconditions.

None of the above-cited references disclose or suggest methods usingdi-tyrosyl cross-linking for formation of buried chemical cross-linksfor stabilizing a protein complex while maintaining the complex'sactivities and specificities. Accordingly, a need exists for suchmethods wherein the product is functional under a wide range ofphysiological and non-physiological conditions, and wherein thestructure, function, and specificity of the cross-linked protein complexis maintained.

Citation or identification of any reference in Section 2 or any othersection of this application shall not be construed as an admission thatsuch reference is available as prior art to the present invention.

3. SUMMARY OF THE INVENTION

This invention provides a method for stabilization of a polypeptide orpolypeptide complex, by the introduct ion of intra-polypeptide and/orinter-polypeptide di-tyrosine bonds, which simultaneously maintains thestructure and function of the polypeptide or polypeptide complex.Further, this invention provides various methods for optimizing proteinstabilization. Such methods include statistical analyses of the primaryamino acid sequences of related proteins (two-dimensional data analysis)and statistical analyses of the three-dimensional coordinates ofproteins believed to be related in three-dimensional structure(three-dimensional data analysis).

Further, this invention provides stabilized polypeptides and polypeptidecomplexes. To achieve stabilization, the cross-link reaction iscarefully controlled such that polypeptides and polypeptide complexesmaintain their original functionality. In one embodiment, the inventionprovides a method for the identification of amino acid residues which,when cross-linked, are least disruptive to the structure and function ofthe polypeptide or polypeptide complex. In another embodiment, theinvention provides a method for mutagenesis of identified residues tofurther control the cross-link reaction. Polypeptides and polypeptidecomplexes so stabilized can be utilized under a wide variety ofphysiological and non-physiological conditions. Further, the cross-linkmethodology disclosed herein may preclude the need for addition ofexogenous structures to engineered proteins and complexes, such aspeptide linkers. In another embodiment, the invention provides a methodfor statistical analysis of databases of structural and/or sequenceinformation available for polypeptides and polypeptide complexes to bestabilized. The statistical analysis identifies suitable residue pairswhich are least likely to be disruptive of structure and function whencross-linked. Further, in a polypeptide chain or chains to becross-linked, potentially undesirable reactive side-chains may bealtered using site-directed mutagenesis, e.g, to introduce a maximallyconservative point mutation that will not support the cross-linkreaction. The cross-link reaction conditions may also be adjusted toprevent undesired cross-links. At residues identified as desirablepositions for cross-linking, reactive side-chains may be introduced bysite-directed mutagenesis, and the cross-link reaction is carried outusing the conditions identified above.

4. BRIEF DESCRIPTION OF THE FIGURES

The present invention may be understood more fully by reference to thefollowing detailed description illustrative examples of specificembodiments and the appended figures.

FIG. 1 The dityrosyl cross-link and example proteins which can bestabilized according to methods of the invention. A. Schematicrepresentation of a dityrosyl cross-link. Addition of a cross-linkingcatalyst and an oxidizing reagent to a protein or protein complexpreparation wherein at least two tyrosine residues occur in closeproximity and in proper orientation results in a dityrosyl cross-linkand one water molecule. B. Schematic representation of the canonicalfold of a/b hydrolases, a group of enzymes which includes lipases. Thetopological positions of the active site residues are indicated as solidcircles. From K.-E. Jaeger et al., 1999, Ann. Rev. Microbiol. 53,315-351. C. Schematic representation of secondary structure of Candidaantarctica lipase B. The topological positions of the active siteresidues are indicated as residues S105, D187, and H224. From J.Uppenberg et al., 1994, Structure 2, 293-308. D. Schematicrepresentation of an immunoglobulin molecule (IgG). The immunoglobulinhetero-tetramer comprises two identical light chains, and two identicalheavy chains. The complex is stabilized by inter-chain disulfide bonds;the disulfide bonds are indicated by the “S-S” links in the schematicrepresentation. Both antigen-binding domains, one at either end of the“fork”, consist of a pair of heavy and light chain variable regions, andare referred to as the “Fv fragments”. The antigen-binding domain is theFv, fragment, consisting of the variable region of both the heavy andlight chain consist of four relatively conserved Framework Regions thatprovide the overall structure, and of three Complementarity DeterminingRegions that lend the Fv fragment its specificity for a specificantigen. The Fab fragment, Which comprises both the light and heavychain variable regions (Vl & Vh), constant region of light chain (Cl),and the first constant region of the heavy chain (Chl), is stabilized byan inter-chain disulfide bond. In the Fv fragment none of theimmunoglobulin inter-chain disulfide bonds are present, as indicated,resulting in the requirement for this protein complex to be stabilizedartificially.

FIG. 2. A. Schematic representation of a tyrosyl side-chain, consistingof an alpha carbon (A) which is still part of the polypeptide back-bone,a beta carbon (B), the first atom in the side-chain not part of theback-bone, an aromatic ring, which, in turn, consists of six carbonatoms, and a hydroxyl group (OH). The angle β in the beta carbon betweenthe beta carbon-hydroxyl oxygen axis and the alpha carbon-beta carbonbond is indicated. B. Schematic representation of a tyrosyl-tyrosyl bondindicating in addition the angle β, the angle ω, which is the anglebetween the dityrosyl bond and the carbon-carbon bond in the aromaticring of the cross-linked tyrosyl side chain that is proximal to thebeta-carbon of the same side chain, projected into the two plane of thetwo aromatic rings. Also indicated are the angle α, the angle betweenall carbon residues in the plane of the aromatic rings (120°), and thedegrees of rotational freedom (1) in the dityrosine bond itself, and(2), of the alpha carbon around the beta carbon-gamma carbon (mostproximal carbon atom in the aromatic ring) axis. C. Three-dimensionalangles formed by the alpha carbon-alpha carbon axis, the beta carbons (ψand φ), and the two planes (χ) described by the alpha carbon-alphacarbon axis and (1) the alpha carbon-beta carbon bond of the first chain(A1-B1), and (2) the alpha carbon-beta carbon bond of the second chain(A2-B2).

FIG. 3. The angle ω, indicated in FIG. 2B, is +120°. For thisconfiguration, the alpha carbon distances, angles ψ and φ, and thealpha-beta distance differences (see text) are represented geometricallyfor maximal and minimal configurations (that fall into one plane), giventhis angle ω. The angle b is 109.5°, the tetrahedral angle of carbonatoms, and complete rotational freedom of the alpha carbon around thearound the beta carbon-gamma carbon axis is assumed. In A, the length cis the distance between the two carbon atoms of a carbon-carbon bond;the length v is cos((180°−α)/2)x c, the length h is sin((180°−α)/2)x c,length a is half of the square root of the sum of 7v squared and hsquared, and the length b is the square root of the sum of the square of(a+v) and h squared. In B, v is thecos(180°−(β−(180°−α)/2+arctan(h/7v))x c, h is thesin(180°−(β−(180°−α)/2+arctan(h/7v))x c, and, analogously, length a ishalf of the square root of the sum of 7v squared and h squared, and thelength b is the square root of the sum of the square of (a+v) and hsquared. In the configuration depicted in A, at which the alpha carbondistance is maximal, the angles ψ and φ are (180°−α)/2−arctan(h/7v); inthe configuration in B, at Which the alpha carbon distance is minimalfor an angle w of +120°, ψ and φ are β−(180°−α)/2−arctan(h/7v).

FIG. 4. The angle ω, indicated in FIG. 2B, is −120°. In FIG. 4, thealpha carbon distances, angles ψ and φ, and the alpha-beta distancedifferences (see text) are represented geometrically for maximal andminimal configurations (that fall into one plane), given this angle ω.The angle β is kept constant at 109.5°, the tetrahedral angle of carbonatoms, and complete rotational freedom of the alpha carbon around thearound the beta carbon-gamma carbon axis is assumed. In A, the length xis 4v, the length y is the square root of the sum of h squared and 3vsquared, the length z is the cos(180°−120°+arctan(h/3v)) x y, the lengtha is half of the square root of the sum of (x+z) squared and y squared,the length v is the cos(120°−β) x c, and the length b is the sum of thelengths a and v. In B, the length v is the cos(β−2x(180°−α)/2)x c, andthe length b is the difference of the lengths a and v. In theconfiguration depicted in A, at which the alpha carbon distance ismaximal for an angle ω of +120°, ψ and φ are α−β; in the configurationin B, at which the alpha carbon distance is minimal, ψ and φ are180°−(β−2x(180°−α)/2).

FIG. 5. Structural Coordinate Data, the primary (or input-) data of a3-D database. First two amino acid residues of a representative FvFragment heavy (H) and light (L) chain, in Angstroms; the data of eachatom is represented in rows, the atoms are listed in columns. Coordinatedata is represented for all residue atoms other than Hydrogen atoms,including those involved in the polypeptide backbone and those in theamino acid's side-chain. In the left-hand column, under the heading“Chain”, the identity of the polypeptide chain is listed, with which anatom's coordinates are associated. An Fv fragment consists of twopolypeptides: a heavy chain (H; below) and a light chain (L; above). Thenumber under the heading “K&W” indicates the position of the atom'sresidue within the Kabat & Wu (K&W) alignment system. Under the heading“Atom”, the identity of an atom of the specific amino acid present inthe representative polypeptide at that particular residue are indicated(identified under the heading “Amino Acid” in three letter code). The x,y, and z three-dimensional coordinates of each atom are represented inthe right-hand columns, as indicated.

FIG. 6. Schematic representation of 3 actual Fv fragment entries into a3-D database. Arrays of alpha-carbon coordinate data of heavy and lightchain residues of the Fv fragments, and, as an example of relevantderivative data, calculated inter-chain, inter-atomic distances. Heavychain alpha-carbon data is represented in rows, as described in thedescription of FIG. 5, and light chain alpha-carbon data is transposed,and the light chain data described in FIG. 5 is represented in columns.Derivative data describing the inter-chain, 3-D relationships of theatoms on both chains is represented at the intersection of each heavychain row and light chain column.

FIG. 7. Statistical measurements in a 3-D database of alpha carbondistances between of Fv fragment heavy and light chain residue pairs, asan example of relevant derivative data. A. Illustrative statisticalmeasurements of the alpha carbon distances between residue pairs of thethree representative Fv Fragment heavy and light chains in thedescription of FIG. 6 (i.e. data shown for n=3). B. Actual statisticalmeasurements of the alpha carbon distances between the residue pairs ofall Fv fragment heavy and light chains in the sample of Fv fragmentsused for the selection (data shown for n=17).

FIG. 8. Schematic representation of a Fv fragment entry (Fv Fragment 1of FIG. 6) into a 3-D database. Arrays of beta-carbon coordinate data ofheavy and light chain residues of the Fv fragment, and, as an example ofrelevant derivative data, calculated inter-chain, inter-atomicdistances. Heavy chain beta-carbon data is represented in rows, andlight chain beta-carbon data is transposed and represented in columns,as described in the description of FIG. 5. Derivative data describingthe inter-chain, 3-D relationships of the atoms on both chains isrepresented at the intersection of each heavy chain row and light chaincolumn.

FIG. 9. Schematic Representation of the approach taken to calculate thedifferences between the inter-chain, inter-atomic residue pairalpha-carbon and beta-carbon distances (‘alpha-beta distancedifferences’) for an individual Fv fragment in the 3-D database (FvFragment 1 of FIGS. 6 and 8). Heavy chain alpha-(top) and beta-carbon(middle) data is represented in rows, and light chain alpha- andbeta-carbon data is transposed, and represented in columns, as describedin the description of FIG. 5. Derivative data describing theinter-chain, inter-atomic distances in the top and middle panels, andthe alpha-beta distance differences in the bottom panel, is representedat the intersection of each heavy chain row and light chain column.

FIG. 10. Alpha-beta distance difference data, derived as describe inFIG. 9, of representative Fv fragments (Fv fragments 1, 2, and 3 of FIG.6) in a 3-D database. Heavy and light chain residues are represented inarrays, where the heavy chain residues are listed vertically, and thelight chain residues are listed horizontally. Data correlated with heavyand light chain residues is represented at the intersection of eachheavy chain row and light chain column.

FIG. 11. Statistical measurements in a 3-D database of alpha-betadistance differences of Fv fragment heavy and light chain residue pairs,as an example of relevant derivative data. A. Illustrative statisticalmeasurements of the alpha-beta distance differences of the pairs betweenthe three representative Fv Fragment heavy and light chains in FIG. 6(i.e. data shown for n=3). B. Actual statistical measurements of thealpha-beta distance differences of the pairs between all Fv fragmentheavy and light chains in the sample of Fv fragments used in the forselection (data shown for n=17).

FIG. 12. Quantification of amino acid side-chain physical properties, asan example of relevant derivative data, at (the first four,representative) residues of the Fv fragment heavy chain, based on Fvfragment polypeptide sequence data, compiled in a 2-D database. A. AminoAcid Sequence Data. Representation of primary data compiled in a 2Ddatabase. Amino acids (AA) occurring at each residue are sorted by thefrequency (F) of their occurrence at that specific residue. B. AminoAcid Side-chain Quantification Tables.

Representation of numeric values used in a 2-D database to obtainrelevant derivative data by quantifying the physical properties of aminoacids: e.g. van der Waals volume [A³] (Richards, F. M.) and numerichydrophobicity values (Eisenberg, D.). C. Quantification of the physicalproperties, exemplified here by van der Waals volumes, of the amino acidside-chains present at each residue in the sample of Fv fragmentsequences in the 2-D database.

FIG. 13. Statistical measurements in a 2-D database of side-chainphysical properties at each residue of Fv fragment heavy chains presentin the 2-D database (sample), as an example of relevant derivative data,quantified as described in the description of FIG. 12. In the thirdcolumn from the left, under the heading “Cons”, the consensus, or mostfrequently occurring amino acid for each represented residue is listed.As representative statistical measures, average and standard deviationsare shown, both weighted and un-weighted by the frequency of each aminoacid's occurrence in the sample at each residue represented in thisfigure. A. Average and standard deviations are shown for residue van derWaals volumes, both weighted and un-weighted by the frequency of eachamino acid's occurrence in the sample at each residue represented inthis figure. B. Average and standard deviations are shown for residueHydrophobicitv quantities, both weighted and un-weighted by thefrequency of each amino acid's occurrence in the sample at each residuerepresented in this figure.

FIG. 14. Schematic illustration of a successive array and a parallelarray of filters designed for automation using a computer system andsoftware for the residue pair selection process. The filters shown arean illustrative set of filters taken from the filters described above(see Identification of Suitable Residues for the Reaction). In thisillustration, the number of selected residues that “passed” each filter,either in succession (left) or in parallel (right), is derived from ananalysis of the 106 amino acids of the Fv fragment light chain, the 120amino acids of the Fv fragment heavy chain, and the resultant 12720possible residue pairs in a given Fv fragment. The percentagesindicating the permissiveness of each filter are also illustrative ofthe Fv fragment example. See text for further discussion (Software forSelection Process).

FIG. 15. A. Nucleotide and amino acid sequence of the C. antarcticaLipase B. Both sequences start were the 25 amino acid pre-propeptide iscleaved. B. Sequences of oligonucleotides used for cloning,site-directed mutagenesis, and error-prone PCR, as indicated. ThepPal-CALB vector is based on the pPICZalphaA vector, whereby the insertis the N-terminally His-tagged reading frame of the CALB gene, asrepresented in A, that is cloned into the EcoRI and NotI sites in themultiple cloning site of the vector. The vector pYal-CALB is based onthe pYES2.1 V5-His-TOPO vector, whereby the insert is the alphafactor—CALB fusion, containing the N-terminal His-tag, EcoRI and NotIrestriction sites, amplified from the pPal-CALB vector. Primers forerror-prone PCR allow for directional cloning of the PCR product intothe EcoRI and NotI sites in the pYal-CALB vector. All of the constrictsare generated by single amino acid substitutions.

FIG. 16. A. Nucleotide and amino acid sequence of Subtilisin E from B.subtilis. B and C. Amino acid sequence alignment of the functionally andstructurally related subtilisin enzymes: the middle row represents thesequence of subtilisin E. D. Oligonucleotides used for cloning andsite-directed mutagenesis of Subtilisin E, as indicated. The A Primerhybridizes with the 5′ end of the gene, B-Primer hybridizes with the 3′end of the gene and further encodes a C-terminal his(6)-tag for use inaffinity purification. The forward and reverse primers indicated are forthe constructs 1-7 containing single and double amino acidsubstitutions. Constructs with double amino acid substitutions aregenerated by making the first amino acid substitution using the forwardand reverse primers X.1, then generating the second substitution usingthe forward and reverse primers X.

5. DETAILED DESCRIPTION OF THE INVENTION

The invention described herein comprises methods for stabilizingpolypeptides and polypeptide complexes. Also provided are polypeptidesand polypeptide complexes stabilized using the described methods. Thestabilization reaction is controlled such that the polypeptides andpolypeptide complexes maintain their original functionality by providingspecifically localized reactive side-chains. The stabilized polypeptidesand polypeptide complexes can be maintained and utilized under a widevariety of physiological and non-physiological conditions withoutexogenous chemical structures that could be immunogenic and/orsignificantly decrease their efficacy.

By taking a statistical approach to analyzing databases of structuraland sequence information for domains of proteins, suitable residue pairsmay be identified at which the cross-link reaction is likely to be leastdisruptive of the overall structure.

At these residues, reactive side-chains are placed via site-directedpoint mutations. In the polypeptide chains that are to be cross-linked,the codons of potentially reactive side-chains at other positions arealso altered to introduce a maximally conservative point mutation thatwill not support the reaction.

5.1. Polypeptides and Polypeptide Complexes Suitable for Application ofthe Invention

Polypeptides and polypeptide complexes that can be stabilized by themethods described herein are single polypeptides or complexes thatconsist of two or more polypeptides and that remain functionally activeupon application of the instant invention. Nucleic acids encoding theforegoing polypeptides are also provided. The term “functionally active”material, as used herein, refers to that material displaying one or morefunctional activities or functionalities associated with one or more ofthe polypeptides of the complex. Such activities or functionalities maybe the polypeptide complexes' original, natural or wild-type activitiesor functionalities, or they may be designed and/or engineered. Suchdesign and/or engineering may be achieved, for example, either bydeleting amino acids, or adding amino acids to, parts of one, any, both,several, or all of the polypeptides, by fusing polypeptides of differentpolypeptides or polypeptide complexes, by adding or deletingpost-translational modifications, by adding chemical modifications orappendixes, or by introducing any other mutations by any methods knownin the art to this end as set forth in detail below.

The compositions may consist essentially of the polypeptides of acomplex, and fragments, analogs, and derivatives thereof. Alternatively,the proteins and fragments and derivatives thereof may be a component ofa composition that comprises other components, for example, a diluent,such as saline, a pharmaceutically acceptable carrier or excipient, aculture medium, etc.

In specific embodiments, the invention provides fragments of astabilized polypeptide consisting of at least 3 amino acids or of astabilized polypeptide complex consisting of at least 6 amino acids, 10amino acids, 20 amino acids, 50 amino acids, 100 amino acids, 200 aminoacids, 500 amino acids, 1000 amino acids, 2000 amino acids, or of atleast 5000 amino acids.

5.1.1. Polypeptide Derivatives and Analogs

Derivatives or analogs of proteins include those molecules comprisingregions that are substantially homologous to a protein or fragmentthereof (e.g., in various embodiments, at least 40% or 50% or 60% or 70%or 80% or 90% or 95% identity over an amino acid or nucleic acidsequence of identical size or when compared to an aligned sequence inwhich the alignment is done, for example, by a computer homology programknown in the art) or whose encoding nucleic acid is capable ofhybridizing to a coding gene sequence, under high stringency, moderatestringency, or low stringency conditions.

Further, one or more amino acid residues within the sequence can besubstituted by another amino acid of a similar polarity that acts as afunctional equivalent, resulting in a silent alteration. Substitutionsfor an amino acid within the sequence may be selected from other membersof the class to which the amino acid belongs. For example, the nonpolar(hydrophobic) amino acids include alanine, leucine, isoleucine, valine,proline, phenylalanine, tryptophane and methionine. The polar neutralamino acids include glycine, serine, threonine, cysteine, tyrosine,asparagine, and glutamine. The positively charged (basic) amino acidsinclude arginine, lysine and histidine. The negatively charged (acidic)amino acids include aspartic acid and glutamic acid. Such substitutionsare generally understood to be conservative substitutions.

The derivatives and analogs of the polypeptides of the complex to bestabilized by application of the instant invention can be produced byvarious methods known in the art. The manipulations that result in theirproduction can occur at the gene or protein level. For example, a clonedgene sequence can be modified by any of numerous strategies known in theart.

Chimeric polypeptides can be made comprising one or several of thepolypeptides of a complex to be stabilized by the instant invention, orfragment, derivative, analog thereof (preferably consisting of at leasta domain of a protein complex to be stabilized, or at least 6, andpreferably at least 10 amino acids of the protein) joined at its amino-or carboxy-terminus via a peptide bond to an amino acid sequence of adifferent protein.

Such a chimeric polypeptide can be produced by any known method,including: recombinant expression of a nucleic acid encoding thepolypeptide (comprising a polypeptide coding sequence joined in-frame toa coding sequence for a different polypeptide); ligating the appropriatenucleic acid sequences encoding the desired amino acid sequences to eachother in the proper coding frame, and expressing the chimeric product;and protein synthetic techniques, for example, by use of a peptidesynthesizer.

5.1.2. Manipulations of a Protein Sequence at the Protein Level

Included within the scope of the invention are polypeptides, polypeptidefragments, or other derivatives or analogs, which are differentiallymodified during or after translation or synthesis, for example, byglycosylation, acetylation, phosphorylation, amidation, derivatizationby known protecting/blocking groups, proteolytic cleavage, etc.

Any of numerous chemical modifications may be carried out by knowntechniques, including but not limited to, specific chemical cleavage bycyanogen bromide, trypsin, chymotrypsin, papain, V8 protease, NaBH₄,acetylation, formylation, oxidation, reduction, metabolic synthesis inthe presence of tunicamycin, etc.

In addition, polypeptides, polypeptide fragments, or other derivativesor analogs that can be stabilized using the methods of the instantinvention can be chemically synthesized. For example, a peptidecorresponding to a portion of a protein can be synthesized by use of apeptide synthesizer. Furthermore, if desired, non-classical amino acidsor chemical amino acid analogs can be introduced as substitutions and/oradditions into the sequence of one, any, both, several or all of thepolypeptides of the complex.

Non-classical amino acids include, but are not limited to, the D-isomersof the common amino acids, fluoro-amino acids, designer amino acids suchas β-methyl amino acids, C γ-methyl amino acids, N γ-methyl amino acids,and amino acid analogs in general.

Examples of non-classical amino acids include: α-aminocaprylic acid,Acpa; (S)-2-aminoethyl-L-cysteine•HCl, Aecys; aminophenylacetate, Afa;6-amino hexanoic acid, Ahx; γ-amino isobutyric acid andα-aminoisobytyric acid, Aiba; alloisoleucine, Aile; L-allylglycine, Alg;2-amino butyric acid, 4-aminobutyric acid, and Ca-aminobutyric acid,Aba; p-aminophenylalanine, Aphe; b-alanine, Bal; p-bromophenylalaine,Brphe; cyclohexylalanine, Cha; citrulline, Cit; β-chloroalanine, Clala;cycloleucine, Cle; p-cholorphenylalanine, Clphe; cysteic acid, Cya;2,4-diaminobutyric acid, Dab; 3-amino propionic acid and2,3-diaminopropionic acid, Dap; 3,4-dehydroproline, Dhp;3,4-dihydroxylphenylalanine, Dhphe; p-flurophenylalanine, Fphe;D-glucoseaminic acid, Gaa; homoarginine, Hag; δ-hydroxylysine•HCl, Hlys;DL-β-hydroxynorvaline, Hnyl; homoglutamine, Hog; homophenylalanine,Hoph; homoserine, Hos; hydroxyproline, Hpr; p-iodophenylalanine, Iphe;isoserine, Ise; α-methylleucine, Mle;DL-methionine-S-methylsulfoniumchloide, Msmet; 3-(1-naphthyl) alanine,1Nala; 3-(2-naphthyl) alanine, 2Nala; norleucine, Nle; N-methylalanine,Nmala; Norvaline, Nva; O-benzylserine, Obser; O-benzyltyrosine, Obtyr;O-ethyltyrosine, Oetyr; O-methylseriine, Omser; O-methylthreonine,Omthr; O-methyltyrosine, Omtyr; Onnithine, Omn; phenylglycine;penicillamine, Pen; pyroglutamic acid, Pga; pipecolic acid, Pip;sarcosine, Sar; t-butylglycine; t-butylalanine; 3,3,3-trifluroalanine,Tfa; 6-hydroxydopa, Thphe; L-vinylglycine, Vig;(−)-(2R)-2-amino-3-(2-aminoethylsulfonyl) propanoic aciddihydroxochloride, Aaspa; (2S)-2-amino-9-hydroxy-4,7-dioxanonanoic acid,Alidna; (2S)-2-amino-6-hydroxy-4-oxahexanoic acid, Ahoha;(−)-(2R)-2-amino-3-(2-hydroxyethylsulfonyl) propanoic acid, Ahsopa;(−)-(2R)-2-amino-3-(2-hydroxyethylsulfanyl) propanoic acid, Ahspa;(2S)-2-amino-12-hydroxy-4,7,10-trioxadodecanoic acid, Ahtda;(2S)-2,9-diamino-4,7-dioxanonanoic acid, Dadna;(2S)-2,12-diamino-4,7,10-trioxadodecanoic acid, Datda;(S)-5,5-difluoronorleucine, Dfnl; (S)-4,4-difluoronorvaline, Dfnv;(3R)-1-1-dioxo-[1,4]thiaziane-3-carboxylic acid, Dtca;(S)-4,4,5,5,6,6,6-heptafluoronorleucine, Hfnl;(S)-5,5,6,6,6-pentafluoronorleucine, Pfnl;(S)-4,4,5,5,5-pentafluoronorvaline, Pfnv; and(3R)-1,4-thiazinane-3-carboxylic acid, Tca. Furthermore, the amino acidcan be D (dextrorotary) or L (levorotary). For a review of classical andnon-classical amino acids, see Sandberg et al (Sandberg M. et al. J.Med. Chem.; vol. 41(14): pp. 2481-91, 1998).

5.1.3. Molecular Biological Methods

Nucleic acids encoding one or more polypeptides stabilized by themethodology of instant invention are provided. The polypeptides, theirderivatives, analogs, and/or chimers, of the complex can be made byexpressing the DNA sequences that encode them in vitro or in vivo by anyknown method in the art. Nucleic acids encoding one, any, both, several,or all of the derivatives, analogs, and/or chimers of the complex to bestabilized by the methodology of the instant invention can be made byaltering the nucleic acid sequence encoding the polypeptide orpolypeptides by substitutions, additions (e.g., insertions) or deletionsthat provide for functionally acitive molecules. The sequences can becleaved at appropriate sites with restriction endonuclease(s), followedby further enzymatic modification if desired, isolated, and ligated illvivo or ill vitro. Additionally, a nucleic acid sequence can be mutatedill vitro or ill vivo, to create and/or destroy translation, initiation,and/or termination sequences, or to create variations in coding regionsand/or to form new, or destroy preexisting, restriction endonucleasesites to facilitate further in vitro modification.

Due to the degeneracy of nucleotide coding sequences, many differentnucleic acid sequences which encode substantially the same amino acidsequence as one, any, both, several, or all of the polypeptides ofcomplex to be stabilized may be used in the practice of the presentinvention. These can include nucleotide sequences comprising all orportions of a domain which is altered by the substitution of differentcodons that encode the same amino acid, or a functionally equivalentamino acid residue within the sequence, thus producing a “silent”(functionally or phenotypically irrelevant) change.

Any technique for mutagenesis known in the art can be used, includingbut not limited to, chemical mutagenesis, in vitro site-directedmutagenesis, using, for example, the QuikChange Site-DirectedMutagenesis Kit (Stratagene), etc.

5.2. Applications of the Stabilization Technology

The polypeptide and polypeptide complex stabilization methods of theinvention have broad applicability. Some non-limiting examples are setforth below.

5.2.1. General

Polypeptide complexes which are held together in nature by domains thatmediate protein-protein interactions may be stabilized using the methodsof the invention. Further, single polypeptide chains may be stabilizedusing the methods of the invention to engineer intra-chain di-tyrosinecross-links. For example, hormones (e.g. insulin, erythropoietin, humangrowth hormone or bovine growth hormone), other growth factors (e.g.insulin-like growth factors, neurotrophic factors), and enzymes and/orbiosensors and biocatalysts can be stabilized, either alone or togetheras a complex with a receptor or other protein binding partner (McInnesC. and Sykes B. D. Biopolymers; vol. 43(5): pp. 339-66, 1997). Examplesof protein-protein interaction domains which may be stabilized using themethods of the invention include, but are not limited to, leucine-zipperdomains (Alber T. Curr. Opin. Genet. Dev.; vol. 2(2): pp. 205-10, 1992),SH2 and SH3 domains (Pawson T. Princess Takamatsu Symp.; vol. 24: pp.303-22, 1994), PTB and PDZ domains (Cowburn D. Curr. Opin. Struct.Biol.; vol. 7(6): pp. 835-8, 1997; Bockaert J. and Pin J. P. EMBO J.;vol. 18(7): pp. 1723-9, 1999), WD40 domains (Royet J. et al. EMBO J.;vol. 17(24): pp. 7351-60, 1998), death- and death effector domains(Strasser A. and Newton K. Int. J. Biochem. Cell. Biol.; vol. 31(5): pp.533-7, 1999), disintegrin domains (Black R. A. and White J. M. Curr OpinCell Biol.; vol. 10(5): pp. 654-9, 1998), and CARD domains (Chou J. J.et al. Cell; vol. 94(2): pp. 171-80, 1998).

Proteins which dimerize or multimerize to function may be stabilizedusing the methods of the invention. Such proteins include mostimmunoglobulin complexes, including the fragments that retainimmunoglobulin functionality, such as, for example, Fab, F(ab)₂, Fc, andFv fragments (Penuche M. L. et al. Hum Antibodies; vol. 8(3): pp.106-18, 1997; Sensel M. G. et al. Chem. Immunol.; vol. 65: pp. 129-58,1997). Most cell-surface receptors that transmit extracellular signalsto intracellular signaling systems dimerize and contain some of theabove mentioned domains that mediate protein-protein interactions(McInnes C. and Sykes B. D. Biopolymers; vol. 43(5): pp. 339-66, 1997;Guogiang J. et al.; Nature; vol. 401: pp. 606-610, 1999). Furtherexamples are intracellular protein complexes, such as, for example, thecaspases (Choti J. J. et al Cell; vol. 94(2): pp. 171-80, 1998).

Growth factors which may be stabilized using the methods of theinvention include, but are not limited to, those that dimerize tofunction, such as interleukin-8 (Leong S. R. et al Protein Sci.; vol.6(3): pp: 609-17, 1997) and members of the NGF/TGF family. Theseproteins are generally characterized as having 110-120 amino acidresidues, up to 50% homology with each other, and are used for thetreatment of a variety of health disorders, such as cancer,osteoporosis, spinal cord injury and neuronal regeneration. Examples ofthe NGF family include, but are not limited to, NGF, BDNF, NT-3, NT-4/5,and NT-6, TRAIL, OPG, and FasL polypeptides (Lotz M. et al. J. Leukoc.Biol.; vol. 60(1): pp. 1-7, 1996; Casaccia-Bonnefil P. et al. MicroscRes Tech.; vol. 45(4-5): pp. 217-24, 1999; Natoli G. et al. Biochem.Pharmacol.; vol. 56(8): pp. 915-20, 199S). TRAIL is currently inclinical trials, and may be useful to induce apoptosis in cancer cells.OPG is also in clinical trials and may be useful to strengthen bonetissue and prevent bone loss during menopause (Wickelgren I. Science;vol. 285(5430): pp. 998-1001, 1999).

Growth factors that do not dimerize to function, that may be stabilizedusing the methods of the invention include, but are not limited to,polypeptides that can be stabilized by introducing intra-chaindi-tyrosine bonds, such as, as examples, insulin, erythropoietin, any ofthe colony stimulating factors (CSF's), PDGF.

Industrial biocatalytic processes are used in many industry sectors,including the chemical, detergent, pharmaceutical, agricultural, food,cosmetics, textile, materials-processing, and paper industries. Withinthese industries, biocatalysts have many applications, ranging fromproduct synthesis (e.g. amino acid manufacturing, and fine chemicalsynthesis of small-molecule pharmaceuticals) through use as activeagents in products (for example, in biological washing powders) to usein diagnostic testing equipment. Biocatalysts also have industrialapplications that range from wastewater and agricultural soil treatment,to crude oil refinement.

Enzymes that may be stabilized using the methods of the inventioninclude, but are not limited to, enzymes with applications as catalystsin basic, applied, or industrial research, or industry sectors, thatinclude, for example, but are not limited to, the chemical, detergent,pharmaceutical, agricultural, food, cosmetics, textile,materials-processing, and paper industries. Within such industrysectors, enzymes, or biocatalysts, may be applied in any way, or haveany kind of utility, such as, but not limited to, product synthesis, useas active agents in products, use in diagnostic testing equipment, orany other applications that may include, but are not limited to,wastewater and agricultural soil treatment, and crude oil refinement.Examples of synthetic applications include, but are not limited to,amino acid manufacturing and fine chemical synthesis. Examples ofbiocatalytic applications as active agents in products include, but arenot limited to, such applications as biological washing powders.

Biocatalysts may be derived from enzymes of any class, family, or anyother categorization of enzymes, including, but not limited to,oxidoreductases, transferases, hydrolases, lyases, isomerases, ligases,polymerases, lipases, esterases, proteases, glycosidases, glycosyltransferases, phosphatases, kinases, monooxygenases, dioxygenases,transaminases, amidases, and acylases; they may comprise a singlepolypeptide chain, or two or more polypeptide chains of a polypeptidecomplex.

A biosensor is defined as a device that consists of a biologicalrecognition system, often called a bioreceptor, and a transducer. Theinteraction of the analyte with the bioreceptor is designed to producean effect measured by the transducer, which converts the informationinto a measurable effect, such as an electrical signal. A biochipconsists of an array of individual biosensors that can be individuallymonitored and generally are used for the analysis of multiple analytes.A bioreceptor can be a biological molecular species (e.g., an antibody,an enzyme, or a protein) that utilizes a biochemical mechanism forrecognition. Common forms of bioreceptors used in biosensing are basedon antibody/antigen and enzymatic interactions. Biosensors are widelyapplied in biological monitoring and environmental sensing. Furthermore,significant advances are being made in their use in the analysis ofsamples of biomedical interest. (Vo-Dinh and Cullum. Fresenius J AnalChem., vol. 366: pp. 540 551, 2000). As described above, enzymes andimmunoglobulin-derived polypeptides and polypeptide complexes can bestabilized by application of the instant invention. The improvementsthat stabilization of these molecules provides, as described above, isalso of significant relevance to their use in biosensors and biochips.

The technology described herein can be applied alone, or in combinationwith other technologies. In one embodiment, the technology can beapplied in combination with one or more alternative technologies thatprovide additional stability for the protein or protein complex. Inanother embodiment, the technology described herein can be applied incombination with one or more alternative technologies that provideadditional beneficial attributes to the protein or protein complex. Inyet another embodiment, the technology may be applied in combinationwith a single alternative technology that both stabilizes and providesadditional beneficial attributes. In yet another embodiment, thetechnology may be applied in combination with two or more technologies,at least one of which that provides additional stability, and at leastone of which that provides at least one additional attribute.

Combinations of technologies often leads to synergistic effects, i.e.the combination of technologies is more effective than the sum of theeffects of the individual technologies applied individually. Synergiesmay be observed with regard specifically to stabilization, as example,but not limited to, by combining application of the instant inventionwith an in vitro evolutionary approach or immobilization strategies (seebelow).

Alternative technologies that provide additional stability when appliedin combination with the instant technology include, but are not limitedto, generating fusion proteins, such as, for example, single chain Fvfragments (scFv's; see Pluckthun and Pack, Immunotechnology; vol. 3(2):pp. 83-105, 1997); protein derivatization, such as, for example,PEGylation (Wright and Morrison. Trends Biotechnol.; vol. 15(1): pp.26-32, 1997; DeSantis & Jones. Curr. Opin. Biotech., vol. 10(4) pp.324-330, 1999); disulfide cross-linking, generating Such products asdisulfide stabilized biocatalysts (Illanes. Elec. J. Biotech., vol.2(1): pp. 7-15, 1999) or Fv fragments (dsFv's; Reiter and Pastan.TIBTECH; vol. 16(12): pp. 513-520, 1998; Reiteretal. Nat Biotech.; vol.14: pp. 1239-1245, 1996); other cross-link methodologies, such as, forexample, generating cross-linked enzyme crystals by glutaraldehydecross-linking (CLECs; Govardhan. Curr. Opin. Biotech., vol. 10(4) pp.331-334, 1999; Haring and Schreier. Curr. Opin. Chem. Biol., vol. 3(1):pp. 35-38, 1999; Illanes. Elec. J. Biotech., vol. 2(1): pp. 7-15, 1999);other immobilization strategies, such as, for example, embeddingbiocatalysts in gels, such as polyacrylamide (Illanes. Elec. J.Biotech., vol. 2(1): pp. 7-15, 1999), medium engineering, such as, forexample, use of a biocatalyst in organic or aqueous-organic solvents(Carrea G. and Riva S. Angew. Chem. Int. Ed. Engl; vol. 39(13): pp.2226-2254, 2000), and any ill vitro evolution strategies, such as, forexample, directed evolution by DNA shuffling (Stemmer. Nature, vol. 370:pp. 389-391, 1994; Zhao and Arnold. Nucleic Acids Res. vol. 25: pp.1307-1308, 1997; Zhao et al. Nat. Biotechnol., vol 16: pp. 258-261,1998; Shao et al Nucleic Acids Res. vol. 26: pp. 681-683.).

Technologies that may provide additional beneficial attributes to apolypeptide or polypeptide complex when applied in combination with theinstant technology include, but are not limited to, generating fusionproteins, such as, for example, hetero specific diabodies or Fvfragments fused to cytotoxins, protein derivatization, such as, forexample, PEGylation, medium engineering, such as, for example, use of abiocatalyst in an organic or aqueous-organic solvent, and any in vitroevolution strategies, such as, for example, directed evolution by DNAshuffling (see above).

Technologies can be applied simultaneously either by incorporating theprocess of the other technology or technologies in the process ofapplying the instant invention, or vice versa. This would be the case,as a non-limiting example, when applying an in vitro evolutionaryapproach in combination with the instant technology, such as describedin Example II, Chapter 7. Alternatively, technologies can be applied inany succession that best meets the requirements and circumstances of aspecific application.

5.2.2. Immunoglobulin Fv Fragments

Antibodies or immunoglobulin molecules (Ig) are among the mosttherapeutically useful molecules. Their utility results from theirability to bind to given target molecules with extremely highspecificity and affinity. Their function in the immune system is to bindto foreign molecules (such as those present on the surface of pathogens)and to trigger the removal of these foreign molecules from the bodyusing a variety of effector mechanisms.

With the advent of hybridoma technology, based on the work of G. Kohlerand C. Milstein in the early 1980s, it has become possible to engineerpure clones of cells expressing a single antibody. The utility of suchmonoclonal antibodies (MAbs), whose unique binding specificity can becharacterized in detail, is vast. From a monoclonal population ofantibody-producing cells it is possible to isolate the genes encodingthe polypeptide chains that make up the antibody. Efficient large-scaleproduction of recombinant immunoglobulin in yeast or bacterialexpression systems is an active interest of the biotechnology industry.More importantly, however, molecular biological techniques allow us tomanipulate these genes and thereby produce antibody-derived proteinscustom-tailored to individual applications, such as those describedbelow.

One of the major limitations to the clinical effectiveness of antibodiesis their size. Full-length immunoglobulin molecules are effective ashumoral agents, but their size makes it difficult for them to penetratetissues such as solid tumors. As a result, smaller, engineered versionsof antibodies have been designed. Such engineered antibodies aredesigned to retain normal functional specificity with respect to antigenbinding in a much smaller molecule, while at the same time uncouplingthis binding function from the immunoglobulin molecule's otherbiological effector functions (e.g. complement activation or macrophagebinding, FIG. 1D).

Fv fragments have been shown to be the smallest Ig-derived fragmentsthat retain full binding specificity (FIG. 1D). The Fv fragmentessentially comprises only those amino acid sequences of the antibodymolecule that constitute the “variable domain” responsible for antigenbinding. Due to their minimal size, Fv fragments show significantlybetter tissue penetration and can therefore be used in a broader rangeof contexts (e.g. solid tumor therapy). As used herein, Fv fragmentsshall include the variable region of immunoglobulin molecules or theequivalent or homologous region of a T cell receptor.

Amino acid sequence comparisons of the 110-120 residue long V_(H) andV_(L) regions reveal that each is made up of four relatively conservedsequence segments, called the “Framework Regions” (FRs), and threehighly variable sequence segments, called “Complementarity DeterminingRegions” (CDR I, II, & III), which largely determine the specificity ofthe antibody (FIG. 1D, “right arm”).

The heavy and light chain Fv fragment polypeptides associate with eachother largely at sites within the conserved FRs. Fv fragments, however,lack the structural stabilizing inter-chain di-sulfide bonds present inthe Ig constant regions. In order to keep recombinant Fv heavy and lightchains associated and achieve functional stability and affinity, the twochains of the molecule must be “stabilized” by some other means.

5.3. Biocatalysts

Biocatalysts are a preferred class of catalysts for industrial processdevelopment, due to their high specificity and process yields.Specifically, they allow for the use of less energy and less expensivefeedstocks (starting materials), reduce the number of individual stepsleading to a product, and reduce waste products. Their commercial useis, however, still limited by instability, curtailing key applications.This invention provides methods for stabilizing such enzymes, improvingtheir performance as industrial catalysts, and prolonging theirhalf-lives and shelf-lives. Application of the instant invention alsoenables the industrial use of novel, previously unstable, biocatalysts,and thereby also shortens industrial process innovation cycle times.

Specifically, application of the instant invention stabilizesbiocatalysts, for example, by preventing the unfolding of the protein.This increases their ability to catalyze chemical reactions underadverse reaction conditions, prolongs their half- and shelf-lives, andmaximizes their activity at milder, actual process temperatures.

5.4. Obtaining Polypeptides to be Stabilized

Any method known to one skilled in the art may be used to obtain apolypeptide or polypeptide complex to be stabilized according to themethods of the invention.

5.4.1. Purification of Polypeptides

A polypeptide or polypeptide complex to be stabilized using the methodsof the instant invention may be obtained, for example, by any proteinpurification method known in the art. Such methods include, but are notlimited to, chromatography (e.g. ion exchange, affinity, and/or sizingcolumn chromatography), ammonium sulfate precipitation, centrifugation,differential solubility, or by any other standard technique for thepurification of proteins. A polypeptide may be purified from any sourcethat produces it. For example, polypeptides may be purified from sourcesincluding, prokaryotic, eukaryotic, mono-cellular, multi-cellular,animal, plant, fungus, vertebrate, mammalian, human, porcine, bovine,feline, equine, canine, avian, tissue culture cells, and any othernatural, modified, engineered, or any otherwise not naturally occurringsource. The degree of purity may vary, but in various embodiments, thepurified protein is greater than 50%, 75%, 85%, 95%, 99%, or 99.9% ofthe total mg protein. Thus, a crude cell lysate would not comprise apurified protein.

Where it is necessary to introduce one or more tyrosine residues to becross-linked into a purified polypeptide or polypeptide complex, thepolypeptide(s) can be micro-sequenced to determine a partial amino acidsequence. The partial amino acid sequence can then be used together withlibrary screening and recombinant nucleic acid methods well known in theart to isolate the clones necessary to introduce tyrosines.

5.4.2. Expression of DNA Encoding a Polypeptide

Source of DNA

Any prokaryotic or eukaryotic cell can serve as the nucleic acid sourcefor molecular cloning. A nucleic acid sequence encoding a protein ordomain to be cross-linked or stabilized may be isolated from sourcesincluding prokaryotic, eukaryotic, mono-cellular, multi-cellular,animal, plant, fungus, vertebrate, mammalian, human, porcine, bovine,feline, equine, canine, avian, etc.

The DNA may be obtained by standard procedures known in the art fromcloned DNA (e.g., a DNA “library”), by chemical synthesis, by cDNAcloning, by the cloning of genomic DNA, or fragments thereof, purifiedfrom the desired cell (see e.g., Sambrook et al.; Glover (ed.). MRLPress, Ltd., Oxford, U.K.; vol. I, II, 1985). The DNA may also beobtained by reverse transcribing cellular RNA, prepared by any of themethods known in the art, such as random- or poly A-primed reversetranscription. Such DNA may be amplified using any of the methods knownin the art, including PCR and 5′ RACE techniques (Weis J. H. et al.Trends Genet. 8(8): pp. 263-4, 1992; Frohman M. A. PCR Methods Appl.4(1): pp. S40-58, 1994).

Whatever the source, the gene should be molecularly cloned into asuitable vector for propagation of the gene. Additionally, the DNA maybe cleaved at specific sites using various restriction enzymes, DNAsemay be used in the presence of manganese, or the DNA can be physicallysheared, as for example, by sonication. The linear DNA fragments canthen be separated according to size by standard techniques, such asagarose and polyacrylamide gel electrophoresis and columnchromatography.

Cloning

Once the DNA fragments are generated, identification of the specific DNAfragment containing the desired gene may be accomplished in a number ofways. For example, clones can be isolated by using PCR techniques thatmay either use two oligonucleotides specific for the desired sequence,or a single oligonucleotide specific for the desired sequence, using,for example, the 5′ RACE system (Cale J. M. et al Methods Mol. Biol.;vol. 105: pp. 351-71, 1998; Frohman M. A. PCR Methods Appl.; vol. 4(1):pp. S40-58, 1994). The oligonucleotides may or may not containdegenerate nucleotide residues. Alternatively, if a portion of a gene orits specific RNA or a fragment thereof is available and can be purifiedand labeled, the generated DNA fragments may be screened by nucleic acidhybridization to the labeled probe (e.g. Benton and Davis. Science; vol.196(4286): pp. 180-2, 1977). Those DNA fragments with substantialhomology to the probe will hybridize. It is also possible to identifythe appropriate fragment by restriction enzyme digestion(s) andcomparison of fragment sizes with those expected according to a knownrestriction map if such is available. Further selection can be carriedout on the basis of the properties of the gene.

The presence of the desired gene may also be detected by assays based onthe physical, chemical, or immunological properties of its expressedproduct. For example, cDNA clones, or DNA clones which hybrid-select theproper mRNAs, can be selected and expressed to produce a protein thathas, for example, similar or identical electrophoretic migration,isoelectric focusing behavior, proteolytic digestion maps, hormonal orother biological activity, binding activity, or antigenic properties asknown for a protein.

Using an antibody to a known protein, other proteins may be identifiedby binding of the labeled antibody to expressed putative proteins, forexample, in an ELISA (enzyme-linked immunosorbent assay)-type procedure.Further, using a binding protein specific to a known protein, otherproteins may be identified by binding to such a protein either in vitroor a suitable cell system, such as the yeast-two-hybrid system (see e.g.Clemmons D. R. Mol. Reprod. Dev.; vol. 35: pp. 368-374, 1993; Loddick S.A. et al. Proc. Natl. Acad. Sci., U.S.A.; vol. 95: pp. 1894-1898, 1998).

A gene can also be identified by mRNA selection using nucleic acidhybridization followed by in vitro translation. In this procedure,fragments are used to isolate complementary mRNAs by hybridization. SuchDNA fragments may represent available, purified DNA of another species(e.g., Drosophila, mouse, human). Immunoprecipitation analysis orfunctional assays (e.g. aggregation ability in vitro, binding toreceptor, etc.) of the ill vitro translation products of the isolatedproducts of the isolated mRNAs identifies the mRNA and, therefore, thecomplementary DNA fragments that contain the desired sequences.

In addition, specific mRNAs may be selected by adsorption of polysomesisolated from cells to immobilized antibodies specifically directedagainst protein. A radiolabeled cDNA can be synthesized using theselected mRNA (from the adsorbed polysomes) as a template. Theradiolabeled mRNA or cDNA may then be used as a probe to identify theDNA fragments from among other genomic DNA fragments.

Alternatives to isolating the genomic DNA include, chemicallysynthesizing the gene sequence itself from a known sequence or makingcDNA to the mRNA which encodes the protein. For example, RNA for cDNAcloning of the gene can be isolated from cells that express the gene.

Vectors

The identified and isolated gene can then be inserted into anappropriate cloning or expression vector. A large number of vector-hostsystems known in the art may be used. Possible vectors include plasmidsor modified viruses, but the vector system must be compatible with thehost cell used. Such vectors include bacteriophages such as lambdaderivatives, or plasmids such as PBR322 or pUC plasmid derivatives orthe Bluescript vector (Stratagene).

The insertion into a cloning vector can, for example, be accomplished byligating the DNA fragment into a cloning vector that has complementarycohesive termini. However, if the complementary restriction sites usedto fragment the DNA are not present in the cloning vector, the ends ofthe DNA molecules may be enzymatically modified. Alternatively, any sitedesired may be produced by ligating nucleotide sequences (linkers) ontothe DNA termini; these ligated linkers may comprise specific chemicallysynthesized oligonucleotides encoding restriction endonucleaserecognition sequences. Furthermore, the gene and/or the vector may beamplified using PCR techniques and oligonucleotides specific for thetermini of the gene and/or the vector that contain additionalnucleotides that provide the desired complementary cohesive termini. Inalternative methods, the cleaved vector and a gene may be modified byhomopolymeric tailing (Cale J. M. et al. Methods Mol. Biol.; vol. 105:pp. 351-71, 1998). Recombinant molecules can be introduced into hostcells via transformation, transfection, infection, electroporation,etc., so that many copies of the gene sequence are generated.

Preparation of DNA

In specific embodiments, transformation of host cells with recombinantDNA molecules that incorporate an isolated gene, cDNA, or synthesizedDNA sequence enables generation of multiple copies of the gene. Thus,the gene may be obtained in large quantities by growing transformants,isolating the recombinant DNA molecules from the transformants and, whennecessary, retrieving the inserted gene from the isolated recombinantDNA.

The sequences provided by the instant invention include those nucleotidesequences encoding substantially the same amino acid sequences as foundin native proteins, and those encoded amino acid sequences withfunctionally equivalent amino acids, as well as those encoding otherderivatives or analogs, as described below for derivatives and analogs.

Structure of Genes and Proteins

The amino acid sequence of a protein can be derived by deduction fromthe DNA sequence, or alternatively, by direct sequencing of the protein,for example, with an automated amino acid sequencer.

A protein sequence can be further characterized by a hydrophilicityanalysis (Hopp T. P. and Woods K. R. Proc. Natl. Acad. Sci., U.S.A.;vol. 78: pp. 3824, 1981). A hydrophilicity profile can be used toidentify the hydrophobic and hydrophilic regions of the protein and thecorresponding regions of the gene sequence which encode such regions.

Secondary, structural analysis (Chou P. Y. and Fasman G. D.Biochemistry; vol. 13(2): pp. 222-45, 1974) can also be done, toidentify regions of a protein that assume specific secondary structures.Manipulation, translation, and secondary structure prediction. openreading frame prediction and plotting, as well as determination ofsequence homologies, can also be accomplished using computer softwareprograms available in the art. Other methods of structural analysisinclude X-ray crystallography, nuclear magnetic resonance spectroscopyand computer modeling.

5.5. Suitable Residues for a Cross-Linking Reaction

The identification and/or engineering of suitable residues for across-linking reaction may involve one or more of the several steps setforth below.

5.5.1. Introduction of Point Mutations to Control the Cross-LinkReaction

Engineering the overall structure and function of a stabilizedpolypeptide or polypeptide complex is achieved by controlling theavailability of tyrosyl side-chains for the cross-linking reaction, forexample, but not limited to, via mutagenesis. Functionality of apolypeptide or polypeptide complex may be compromised or altered by atyrosine-tyrosine cross-link reaction. In this case, an undesirablehydroxyl group of a tyrosyl side-chain may be removed by mutating suchresidues to phenylalanine, or m asked to inhibit its participation insuch a reaction. In this way, a tyrosyl residue available for thecross-linking reaction but that may lead to distortion of structure andcompromise functionality and/or specificity of the polypeptide orpolypeptide complex is removed. Moreover, point mutations to tyrosinemay be introduced at positions where the tyrosyl side-chains will reactwith each other to form a bond that causes the least distortion tostructure and function; these positions are identified as described indetail below. Thereby, the overall structure and functionality of thepolypeptide or polypeptide complex is maintained.

5.5.2. Removing Undesirable Reactive Side-Chains

Reactive side-chains identified in a polypeptide chain or in thepolypeptide chains of a complex are identified that subjected to theconditions of the oxidative cross-link described above would result in abond that would distort the structure of the complex. These residues areidentified by comparison of the polypeptides' amino acid sequences toavailable structural information on such or similar complexes (seebelow). Such a bond can be formed either between two polypeptide chainsof the complex (inter-chain bond) or between two residues of one and thesame polypeptide chain (intra-chain bond). The effect of the formationof a bond is determined by both of the reactive side-chains involved inthe formation of such a bond, and therefore these residues would beidentified in pairs.

To neutralize this damaging effect of the cross-link reaction, maskingreagents that protect aromatic side chains (Pollitt S. and Schultz P.Agnew. Chem. Int. Ed.; vol. 37(15): pp. 2104-2107, 1998) may be use, oramino acid substitutions to phenylalanine, or any other amino acid, maybe introduced at least at one of the residues involved, for example, byintroducing a point mutation in the cDNA of the gene directing theexpression of the polypeptide.

5.5.3. Introducing Reactive Side-Chains

To achieve a stabilized polypeptide or polypeptide complex withoutdisrupting its structure and/or function, positions within eachpolypeptide are identified at which a reactive side-chain would be ableto form a bond with a reactive side-chain on the, or one of the, otherpolypeptide chain(s). Such positions are selected both with respecttoward maintaining the overall structure of the same polypeptide, andwith respect toward the suitability of a position in the otherpolypeptide involved in the bond, and the positions are thereforeselected in pairs (see below for detailed description of selectionprocess).

When at a selected residue of either, or any, polypeptide(s) thereactive tyrosyl side-chain is not already present, a point mutation maybe introduced, for example, but not limited to, by using molecularbiological methods to introduce such a point mutation into the cDNA ofthe gene directing its expression, such that a reactive side-chain ispresent and available for the reaction.

5.6. Structurally Conserved Domains 5.6.1. Relationship BetweenStructure and Function

It is the three-dimensional, or the tertiary, structure of everyprotein, and the quaternary structure of every protein complex thatlends them the functionality that has allowed them to be maintained anddeveloped through the evolutionary process over time. A point mutationin the gene of a polypeptide or polypeptide complex that leads to anamino acid substitution at any given residue will alter the structure ofthe polypeptide and/or of the overall complex to a greater or lesserextent. The extent of such an amino acid substitution's effect on thestructure of the polypeptide or polypeptide complex is dependent on'thestructural context of the residue, and on the nature of the resultantamino acid's side-chain.

Protein domains that show extensive similarity their amino acidsequences to domains in other proteins are referred to as “conserveddomains”. Within conserved domains individual residues are moreconserved than others; some can be 100% conserved, and others not atall. Most conserved domains are not only similar in their amino acidsequences, but also in their three-dimensional structures, and also intheir functions. In the absence of evolutionary pressures that require aresidue of a domain to be conserved, it is thought that the amino acidpresent at a residue would vary widely due to the rate of mutation thatdrives evolutionary diversification. Hence, the residues within aconserved domain that are highly conserved are thought to be importantcontributors to the overall structure, or the architecture, of thedomain. Among the residues that are less conserved are those thatcontribute to the specificity of the individual domain of the group.

Conserved domains, however, can also show very little sequence homologyand yet have conserved structures, such as, for examples, leucinezippers (Alber T. Curr. Opin. Genet. Dev.; vol. 2(2): pp. 205-10, 1992).Since a conserved structure also yields structurally conserved residues,the distinction between the above described ‘architectural’ and‘specificity determining’ residues can also be made in the absence ofsequence conservation. For the purposes of the instant invention, aconserved domain is defined, depending on the availability of data,either by sequence homology, which can be as low as 5% identity orsimilarity, or by the group of domains' structure or functionally.

5.6.2. Alignment of Conserved Residues

Alignment of the two-dimensional sequences of conserved domains revealsfurther that between conserved residues there are frequentlyinterspersed by chains of varying lengths, i.e. there are varyingnumbers of amino acid residues between conserved residues important forthe overall structure of the domain. In order to be able to compare thesequences of individual domains to determine where to direct thecross-link reaction to, it is essential that the sequences are alignedin such a way that amino acids that correspond structurally to oneanother are compared. For residues identified from amino acid andnucleotide sequence analyses as highly conserved, this is easilyaccomplished.

5.7. Statistical Selection Method

Structural comparisons of proteins and protein complexes can informtoward the identification of important residues, and toward determiningthe suitability of a residue or group of residues for modifications thatare intended not to disrupt the fold, structure, and/or function of theprotein or protein complex. A method of evaluating sets of data onrelated to the amino acid sequence, the structure, and/orfunction/functionality of related polypeptides statistically for thepurpose of identifying important residues, or suitable residues formodification within a protein or protein complex of interest, or a groupof related proteins or protein complexes of interest, is disclosed.

Given the availability of relevant data, it is often possible to assignquantitative values for certain characteristics of an amino acid sidechain present at each residue of a domain, polypeptide, or polypeptidecomplex. Furthermore, given the relevant data on domains, polypeptides,or polypeptide complexes, it is possible to give groups of amino acidsvalues that describe their structural and/or functional relationship.These values can be compared between individual domains by aligning thedata in such a way that the sets of values to be compared arestructurally and functionally related (see above). If there is asufficient number of individual domains, polypeptides, or polypeptidecomplexes, for which such data is available, it is possible to analyzethese sets of data statistically.

Statistical analysis of sets of data provides information concerning thedegree of structural conservation and/or variability of a residue or agroup of residues in a sample, and an indication to what extent aresidue or a group of residues are involved in providing the underlyingarchitecture, or the specificity, of a domain. This information isderived from statistical measurements that include, but are not limitedto, a given value's average, variance, standard deviation, range,maximum, and minimum. For example, high variance or standard deviationmeasurements of a certain value implies high variability of a certainvalue of a residue or a group of residues, and thus a low degree ofconservation, and vice versa.

From the measurements that are made on a set of data, it is possible tomake predictions for the suitability of residues, or groups of residues,in related domains, polypeptides of polypeptide complexes that are, andthat are not, present in the sample. A residue that is highly conservedin a sample of related polypeptides with regard to one or more relevantsets of data has a high likelihood of having similarity in allindividual polypeptides including those not present in the sample.Therefore, using statistical analyses to identify important residuesand/or to determine which residues are suitable for modification, lendsthis methodology a higher degree of generally applicability.

Potential applications of this methodology include, but are not limitedto, structure-function analyses of polypeptides or polypeptidecomplexes, that include, for example, but are not limited to,determining the importance of one of more side-chains of a residue or agroup of residues in either the active site of an enzyme, theprotein-protein interaction surface of a polypeptide or polypeptidecomplex, the substrate binding pocket of an enzyme, and/or the bindingpocket of an inhibitor.

Furthermore, as described below, this methodology can be applied toidentify residues or groups of residues that are suitable formodifications that include, but are not limited to, the substitution ofone or more amino acids (for example, by point-directed mutagenesis)and/or chemical modification. Non-limiting examples of suchmodifications include substitutions of amino acids to cysteines towardthe formation of disulfide bonds; substitution of amino acids totyrosine and subsequent chemical treatment of the polypeptide toward theformation of dityrosine bonds, as disclosed in detail herein; one ormore amino acid substitutions and/or chemical modification towardgenerating a binding pocket for a small molecule (substrate orinhibitor), and/or the introduction of side-chain specific tags (e.g. tocharacterize molecular interactions or to capture protein-proteininteraction partners).

The selection of residues and/or residue pairs to which a modificationcan be directed to stabilize a polypeptide or polypeptide complexfunctionally is preferably carried out by analyzing data on severalpolypeptide or polypeptide complex structures of a group of conserveddomains or polypeptides statistically and selecting the residue pairsbased on selection criteria, such as those developed and describedbelow.

5.8. Generation and Use of Databases 5.8.1. Generating Data Relevant tothe Selection Criteria

The increasing availability of data concerning the genes, proteins, andother bio-molecules of many living species, make it possible to compilea significant amount of data on several protein domains/modules forstatistical analyses to make predictions, as described above. This datacan be transformed into data that can be utilized for such analysesdirectly.

Such transformations can, for instance, be done by converting nucleotidedata into amino acid sequence data, and further by converting amino acidsequence data into numeric data concerning the physical properties ofthe amino acids' side-chains of a given residue. Such properties, forinstance, can be the charge or the degree of hydrophobicity of aresidue's side-chains (see below).

Furthermore, structural data of a polypeptide or of two or severalpolypeptides in a complex can be transformed into numeric data thatdescribes the structural relationship of the individual residues withthe other residues of the polypeptide or those of the otherpolypeptide(s) in the complex. An example for such a transformationwould be the calculation of the distances between the alpha carbons of aresidue pair using three-dimensional coordinate data derived fromcrystallographic resolution of a polypeptide's or a complex' structureusing Pythagorean three-dimensional geometry.

It is possible to generate many different sets of data relevant for thestabilization according to the procedure of this invention concerningmany of the structural features of the residues and residues pairs of adomain or a complex. As often more qualitative judgements are requiredto determine the reliability of the selection inputs, it also becomes amore qualitative decision how many different sets of data should be usedin the identification or selection of residues or groups of residues.The less reliable the inputs, the more useful it is to implementadditional information in the selection.

5.8.2. Data Sources

Sequence Data

The most direct way of accumulating sequences is by cloning andsequencing cDNAs of proteins that contain the domains/modules ofinterest. Sequence data is becoming more and more available through theefforts of the genome projects. Much of the sequence data is availablein databases that can be accessed through the internet, or otherwise,and furthermore there are several published sources that haveaccumulated sequences of specific domains/modules. One such collectionof specific sequence data is the Kabat Database of Sequences of Proteinsof immunological Interest (http://immuno.bme.nwu.edu; Johnson, G. et al.Weir's Handbook of Experimental Immunology 1. Immunochemistry andMolecular Immunology, Fifth Edition, Ed. L. A. Herzenberg, W. M. Weir,and C. Blackwell, Blackwell Science Inc., Cambridge, Mass., Chapter6.1-6.21, 1996) that contains, among other things, sequences ofimmunoglobulin molecules (see Sections 6-8, Examples). Such sequencedata is also available from Genebank (http://www.ncbi.nlm.nih.gov).

Structural Data

Three-dimensional structures, as described by atomic coordinate data, ofa polypeptide or complex of two or more polypeptides can be obtained inseveral ways.

The first approach is to mine databases of existing structuralco-ordinates for the proteins of interest. The data of solved structuresis often available on databases that are easily accessed in the form ofthree-dimensional coordinates (x, y, and z) in Angstrom (10⁻¹⁰ m) units.Often this data is also accessible through the internet (e.g. on-lineprotein structure database of the National Brookhaven Laboratory:www.nbl.pdb.gov).

The second utilizes diffraction patterns (by for example, but notlimited to X-rays or electrons) of regular 2- or 3-dimensional arrays ofproteins as for example used in the field of X-ray crystallography.Computational methods are used to transform such data into 3-dimensionalatomic co-ordinates in real space.

The third utilizes Nuclear Magnetic Resonance (NMR) to determineinter-atomic distances of molecules in solution. Multi-dimensional NMRmethods combined with computational methods have succeeded indetermining the atomic co-ordinates of polypeptides of increasing size.A fourth approach consists entirely of computational modeling.Algorithms may be based on the known physio-chemical nature ofamino-acids and bonds found in proteins, or on iterative approaches thatare experimentally constrained, or both. An example of software is theCNS program developed by Axel Brunger and colleagues at the HHMI at YaleUniversity (Adams P. D. et al. Acta Crystallogr. D. Biol. Crystallogr.;vol. 55 (Pt 1): pp. 181-90, 1999).

Functional Data

Functional data is not as easily used, as there is no uniform way ofstandardizing and compiling it, such as nutcleotide or amino acidsequence data, or coordinates for structural data. It is generated inmany different ways, such as genetic, biochemical, and mutationalanalyses, molecular biological dissection and the construction ofchimerical domains. In many cases the data available is not alwaysclearly interpretable and therefore its use becomes less clearlydelineated. But when available, functional data provides valuableinformation concerning the specificity and functionality of adomain/module, and where possible is preferably incorporated into theselection process.

Functional data is preferably also generated after the cross-linkreaction according to the present invention to ensure that thepredictions made were accurate for the specific application, and thatthe polypeptide or polypeptide complex actually retained itsfunctionality and specificity.

5.8.3. Construction of Databases

3-D Database

A database of structural information including the atomic coordinatedata of crystallographically solved polypeptides and polypeptidecomplexes of a group of conserved polypeptides or domains and theirligands, and derivative, relevant data is compiled. Input data isderived from structural coordinate data files. Data relevant to theselection process in this database is derived from coordinate data byapplying coordinate geometry in three dimensions. This databasepreferably contains, for example, in addition to the structuralcoordinate data, the following, relevant data together with statisticalmeasurements (e.g. mean, median, mode, standard deviation, maximum, andminimum) on each of the following features for each residue pair,whereby the sample polypeptides or polypeptide complexes are aligned asdescribed above.

1. Inter-chain alpha carbon to alpha carbon distances of the polypeptidepair(s) of a polypeptide or complex, in order to find residue pairs thatare appropriately spaced for a tyrosyl-tyrosyl bond to be formed. Thesedistances are calculated by, for instance, but not limited to, applyingPythagorean geometry to the 3D coordinates of the alpha carbons. Forevery residue pair statistical measurements are calculated, such as theaverage, standard deviation, range and median of corresponding alphacarbon-alpha carbon distances.

2. The three angles, φ, ψ and χ (FIG. 2C) in relation to which theside-chains of each residue pair are oriented toward each other relativeto the inter-chain alpha carbon-alpha carbon axes, are calculated fromthe coordinates of the alpha and beta carbons of each pair for eachpolypeptide or polypeptide complex in the sample. The angles arecalculated by defining two planes, each of which are defined by bothalpha carbon positions and one of the beta carbons' positions. Byapplying analytical geometry, each of the angles in the alpha carbons(scalar products), and the angle formed by the planes (vector products)are calculated. Statistical measurements are also made from this set ofdata, as described for the alpha carbon spacing.

The difference between the alpha carbon distance (i.e. the backbonecarbon distance) and the beta carbon distance (i.e. the distance betweenthe first carbons in each side chain) of each residue pair can also becalculated as a proxy of the orientation of the side chains relative toeach other (see below).

2-D Database

A database of DNA or amino acid sequences of polypeptides orpolypeptides involved in complexes of a kind, including residueside-chain usage from sequence data and derivative, relevant data iscompiled. Data relevant to the selection process in this database isderived from sequence data by applying a numeric value representing thephysical properties of every occurring amino acid side chain at eachresidue, whereby the sample polypeptides or polypeptide complexes arealigned as described above. This database contains, for example, inaddition to sequence data, the following, relevant data together withstatistical measurements (e.g. mean, median, mode, standard deviation,maximum, and minimum) on each of the following features for each residuepair. The statistical measurements can be made and stored on theoccurring amino acids at each residue both weighted and un-weighted bythe frequency at which the specific side chain occurs at this residue.

1. Numeric data concerning the bulk/volume of residues' side chains,such as, but not limited to, chemical composition, molecular weight andvan der Waals volumes (Xia X. and Li W. H.; Richards, F. M.).

2. Numeric data concerning the polarity of the residues side-chains,such as, but not limited to, charge, isoelectric point, andhydrophobicity (Xia X. and Li W. H.; Eisenberg, D.).

Examples of other amino acid side chain property measurements that canbe incorporated in such a database are that can be analyzed arearomaticity, aliphaticity, hydrogenation, and hydroxythiolation (Xia X.and Li W. H.).

Database of Functional Data

Where it is possible to obtain functional data that indicates theimportance of a residue/residue pair for the polypeptide's orpolypeptide complex' overall structure and/or specificity, it ispreferably incorporated into the selection process, as it enhances theaccuracy of the statistical predictions made. Such data is preferablyquantified, to whatever degree possible, with respect to individualresidues and/or residue pairs of a polypeptide or complex, or withrespect to sub-domains or domains that mediate protein folding orprotein-protein interactions, and compiled in a suitable database.

5.8.4. Required Sample Size (N)

Often the availability of data is limiting for this approach. However,to make statistical measurements on a sample of polypeptides orpolypeptide complexes in order to identify residues or select residuesor groups of residues for modification, it is best to use a largesample, as it will yield more accurate predictions. But often it is verylabor-intensive accumulating and/or aligning the data in such a way thatmeasurements become meaningful (see above). Since there is always alimited range of values, and since therefore their variability is alsolimited, accurate predictions can also be made from smaller sets ofdata. A sample with more than 15 individual structures, sequences orfunctional units is preferable.

However, previously methods have been used to position othercross-links, such as di-sulfide bonds, by examining only the onepolypeptide or complex in which the point mutations are to be made, andthis has resulted in functional complexes (Pastan et al., U.S. Pat. No.5,747,654 issued May 5, 1998). Therefore it is possible to makepredictions that can be accurate on a small sample. However, in order tomake predictions based on statistics that include such measurements asstandard deviations, it is not meaningful to use a sample size less thanthree (a standard deviation on 2 points of data is not a meaningfulmeasurement). Therefore the minimum of a sample size is three for any 1statistical analyses.

5.9. Selection Process 5.9.1. Selection Criteria for Amino AcidSubstitutions

Structural Suitability

The object of such analyses is to determine which residues pairs will bemost suited for the cross-link reaction in order to main the structure,function, and specificity of a polypeptide or polypeptide complex.Therefore, many of the criteria the residue pairs are selected forrelate to the pairs' potential to accommodate two cross-linked reactiveside-chains without distorting the peptide-bond backbone and alteringthe structure of the polypeptide or complex at positions that enable anddefine its function and specificity.

Measurements that can be made to attain information concerning thispotential relate to the determinants of the space available for thereactive side-chains and the bond. Such measurements include thedistance between the residue pairs' alpha-carbons, which are the carbonatoms that are a part of the “backbone” formed by the peptide bondsbetween all amino acids of the polypeptide. The selected residue pairsshould have an average alpha-carbon distance close to the distance thatthe alpha-carbons of the cross-linked tyrosyl side-chains would be fromeach other if point mutations were introduced, and the cross-linkreaction were directed to that residue pair. The selected residue pairsshould be should be so close to the distance of the alpha-carbons ofcross-linked tyrosyl side-chains to ensure that the functionality of thepolypeptide or polypeptide complex is maintained. The criteria for thisselection are described in detail below (Selection Process:Determination of the Alpha Carbon Distance in the Tyrosyl-tyrosyl Bond,The Filters). Since the variability of a residue pair's structuralcharacteristics is also an important criterion in the selection ofsuitable residue pairs for the cross-link reaction (see below), therequired proximity to the optimal distance is calculated for eachresidue pair, dependent on the variability of its alpha-carbon distancesin the sample. The calculation of this requirement is also described indetail below (Selection Process: The Filters).

Measurements can also be made to determine whether the protein will foldin such a way that the reactive side-chains will be directed toward eachother. Selection criteria can be developed based on the angles of thereactive side-chains and of the cross-link, the rotational freedom ofthe reactive side-chains, and measurements concerned with thethree-dimensional geometrical relationship between the alpha-carbons andthe beta-carbons of each residue pair. The beta carbon is the firstcarbon atom of the amino acid side-chains not part of the backbone. Suchselection criteria are described in detail below (Selection Process:Calculations of Side-chain Angles in the Tyrosyl Bond, The Filters). Thesmallest amino acid, glycine, does not have a beta-carbon, and thereforeresidue pairs of which one or both of the amino acids is a conservedglycine cannot be analyzed in this way. Since mutation of a conservedglycine would likely lead to a significant structural distortion,residue pairs of which one or both residues are a conserved glycine areeliminated. This selection criterion is also described in detail below(Selection Process: The Filters). Furthermore, the structural context ofthe residue pair is preferably considered to ascertain the availabilityof three-dimensional space for the reactive side-chains and the bond.The relevant amino acid side-chain characteristics of proximal residuestherefore are preferably taken into account, to further substantiatethat the reactive side-chains will be able to rotate such that the bondcan be formed without distorting the polypeptide backbone. If thecontext is such that the reactive side-chains introduced by pointmutation will not be able to rotate freely into the desired position,the bond will either not readily be formed, or distortions will occurthat could potentially impair or alter the function and/or specificityof the polypeptide or polypeptide complex. Therefore, selection criteriaare developed to allow more conservative point mutations to beintroduced that will be less likely to cause structural distortions.Such criteria are based on the amino acids present at, and surrounding,the residues of a pair, and are quantified based on numeric values ofthe physical properties of those amino acid side-chains. The calculationof such requirements is described in detail below (Selection Process:The Filters).

If a suitable residue pair can be identified that is already anappropriated reactive amino acid on both chains at some frequency in thesample, this pair would be an ideal selection. However, reactiveside-chains present in the polypeptides or polypeptides of the complexto be cross-linked that would cause structural distortions by formingeither inter- or intra-chain bonds should be neutralized, either by ameans of masking/protecting them (Pollitt S. and Schultz P. Agnew. Chem.Int. Ed.; vol. 37(15): pp. 2104-2107, 1998) or by introducing maximallyconservative point mutations. Such reactive residue pairs are identifiedusing the same criteria as for the positive selection of residue pairssuitable for cross-linking. However, the presence of undesirableside-chains can only be determined by analyzing the specific sequence ofan individual domain, and by comparing it with the structuralinformation used for the positive selection.

Variability

The specificity of each individual domain and its counterpart in thesame protein or in another protein of a complex is generally determinedby residues that are less, or not, conserved. Therefore, considering thespecificity of an individual domain, a residue with high variability canbe a less desirable choice to which to direct the cross-link reaction.However, considering the overall structure and architecture of a domain,the architecture of the domain can more likely accommodate a mutation ata residue that exhibits a high degree of variability. Thus, from thisperspective, high variability indicates that a residue is a bettercandidate at which to introduce a point mutation, and place a reactiveside-chain.

Depending on the reliability and accuracy of these analyses, which, inturn, depends on the reliability of the inputs into the analyses (seebelow), it is possible to vary the requirement for a position's, or apair's variability (which indicates a certain degree of flexibilityand/or robustness). Thus, if the inputs are highly accurate, andsufficient data is present in the sample, it is possible to determinethat a residue pair is highly suitable for the reaction although itsvariability is low. However, in cases where there is insufficient dataor insufficient accuracy in the inputs for the analyses to allow for lowvariability, a residue that is important for the specificity, but notfor the overall architecture of the domain may be selected. In theabsence of functional data it is very difficult to determine a residue'scontribution to the specificity of the domain.

5.9.2. Determination of the Alpha Carbon Distance in the Tyrosyl-TyrosylBond

As stated above, selected residue pairs should have an averagealpha-carbon distance close to the distance of the alpha-carbons ofcross-linked tyrosyl side-chains. The range of distances that ispossible between the alpha carbons of two cross-linked tyrosines iscalculated for the epsilon-epsilon bonded isoform of the cross-link byapplying standard geometry, Pythagorean geometry, and trigonometry. Thecalculations are based on all carbon-carbon bonds dityrosine bondforming 120 degree angles due to the planar structure of the aromaticring with the exception of the angle in the beta carbon, which forms thetetrahedral angle of 109.5 degrees (FIG. 2A).

Furthermore, these calculations take into consideration that thestructure of the dityrosine has significant degrees of rotationalfreedom, and that therefore the distance between the alpha carbons ofthe two tyrosines can be quite different depending on its conformation.Specifically, the rotational freedoms in the beta carbon-gamma carbonbonds, and the rotational freedom in the bond linking the aromatic ringsare considered. Other isoforms of the cross-link are, however, possible,which would enable even closer distances between the alpha-carbons ofthe dityrosine, which is further taken into consideration in setting thepossible ranges in the selection process of the residue pairs, asdescribed below in the “Filters”.

The angle χin FIG. 2C is the angle formed by the two planes, eachdefined by the alpha carbon-alpha carbon axis, and individually by thepositions of each of the beta carbons of the two tyrosyl side-chainsinvolved in the bond. The angle ω, determined by the rotational freedomin the dityrosine bond itself, is 120° in FIG. 3, and −120° in FIG. 4.

The schematic depictions of possible bond configurations for an angle ωof 120° in FIG. 3 represent an angle χ of 180°, at which both themaximal and minimal angles are in the projected plane. The schematicdepictions of possible bond configurations for an angle ω of 120° inFIG. 4 represent an angle χ of 0°, at which both the maximal and minimalangles are in the projected plane.

For an angle ω of 120° and an angle χ of 180°, and in the configurationat which the alpha carbon distance is at a minimum (FIG. 3A), the alphacarbon distance is 11.74 Å: in the configuration, in which the alphacarbon distance is at a maximum (FIG. 3B), the alpha carbon distance is9.56 Å.

For an angle ω of −120′ and an angle χ of 180°, and in the configurationat which the alpha carbon distance is at a minimum (FIG. 4A), the alphacarbon distance is 10.73 Å; in the configuration, in which the alphacarbon distance is at a maximum (FIG. 4B), the alpha carbon distance is5.70 Å.

5.9.3. Calculations of Side-Chain Angles in the Tyrosyl Bond

The angles φ and ψ (FIG. 2C) are the angles in each of the alpha carbonatoms between the alpha carbon-alpha carbon axis and the alphacarbon-beta carbon bond. They are calculated for the maximum and minimumdistances between the alpha carbon atoms based on the rotationalflexibility of the carbon-carbon bonds in the beta carbon atom.

The schematic depictions of possible bond configurations for an angle ωof 120° in FIG. 3 represent an angle χ of 180°, at which both themaximal and minimal angles are in the projected plane. The schematicdepictions of possible bond configurations for an angle ω of 120° inFIG. 4 represent an angle χ of 0°, at which both the maximal and minimalangles are in the projected plane.

For an angle ω of 120° and an angle χ of 180°, and in the configurationat which the alpha carbon distance is at a minimum (FIG. 3A), the anglesφ and ψ are maximal and equal at approximately 77.1°; in theconfiguration, in which the alpha carbon distance is at a maximum (FIG.3B), the angles φ and ψ are minimal and equal, at approximately 34.5°.

For an angle ω of −120° and an angle χ of 0°, at which the alpha carbondistance is at a minimum (FIG. 4A), the angles φ and ψ are maximal andequal at 130.5°; in the configuration, in which the alpha carbondistance is at a maximum (FIG. 3B), the angles φ and ψ are minimal andequal, at 10.

Differences in the Alpha-Alpha and Beta-Beta Distances

As a proxy to the orientation of the side-chains, the difference in thealpha-alpha and beta-beta distances (“alpha-beta distance difference”)and its range are calculated again based on the extremes of alpha carbonspacing for angles ω of 120° and −120° (FIGS. 3 and 4). The maximum andminimum of the alpha-beta distance difference is calculated for both ωangles at which the both aromatic rings of the tyrosyl side-chains arein the same plane, and at which the alpha-beta distance difference is atits extremes. This difference is calculated by subtracting twice thelength a from twice the length b in FIGS. 3 and 4.

For an angle ω of 120° (FIG. 3), and in the configuration, at which thealpha carbon distance is maximal, the alpha-beta distance difference is2.37 Å; in the configuration, at which the alpha carbon distance isminimal, the alpha-beta distance difference is 0.19 Å. For an angle ω of−120° (FIG. 4), and in the configuration, at which the alpha carbondistance is maximal, the alpha-beta distance difference is 3.03 Å; inthe configuration, at which the alpha carbon distance is minimal, thealpha-beta distance difference is −2.00 Å.

5.10. The Filters

In cases where sufficient data is available, the selection processpreferably consists of a series of statistical tests or “filters” aimedat successively narrowing down the residue pairs most likely to resultin an inter-chain cross-linked tyrosine pair of a polypeptide orpolypeptide complex that minimally alters the polypeptide's orpolypeptide complex' structural characteristics.

Where it is not possible or inconvenient to obtain the required data forstatistical analyses, residue pairs can also be selected in any otherway, including, for example, trial and error. Such selection processesyield residue pairs to which the cross-link can be directed whilemaintaining the functionality of the polypeptide or polypeptide complex.

An example of a successive set of filters is the following:

1. Selection based on residue pair alpha carbon spacing, based on (1)the calculated maximal and minimal distances in a cross-linked tyrosinepair (see above), and (2) the distances measured and compiled in a 3-Ddatabase. The selection is carried out on the average, median, mode, orany other statistical value suitable to determine whether the pair islikely to be spaced in such a way that the cross-link will minimallydistort the overall structure. The optimal range of residue pair alphacarbon distances to be selected is determined by averaging first theminimal distances in a cross-linked tyrosine pair of the isoformdepicted in FIG. 2B for ω angles of 120° and −120°, and then,analogously, averaging the maximal distances, as calculated above. Thesecalculations result in the following optimal range:

-   -   Min: 7.63 Å. Max: 11.24 Å.

Since distances are possible in a larger range, and because otherisoforms are also possible that would allow for configurations with zerodistance, the average between a zero-distance and the minimal distancebetween alpha carbons for either angle ω provides the lower limit andthe maximal distance between alpha carbons for either angle ω providesthe upper limit of the preferred range. Therefore, the preferred rangeis:

-   -   Min: 2.85 Å, Max: 11.74 Å

Furthermore, it has been demonstrated in several cases that a proteinstructure can often absorb a certain amount of structural changes, andthat the specificity and functionality is nonetheless maintained. It istherefore also possible, though less preferred, to introduce thereactive side-chains into residue pairs that are spaced even beyond thepreferred range. Given this degree of structural flexibility the largestrange possible is:

-   -   Min: 0 Å. Max: 13.74 Å.

2. Selection based on positional flexibility is carried out, asexamples, on the measured/calculated standard deviations or ranges ofthe alpha-carbon distances in the sample, or any other statisticalmeasure that quantifies the variability of the pairs' distancesmeasured/calculated and compiled in a 3-D database. The range for thisselection is preferably set in such a way that the average measuredalpha-carbon distance of the selected residue pairs is within less thanone standard deviation of the preferred range. However, 2 standarddeviations are also possible as a selection criterion.

3. Selection based on side-chain orientation, determined either bycalculating the three-dimensional angles relative to thealpha-carbon-alpha carbon axis (ψ, φ and χ angles, as described in FIG.2C), or by calculating a proxy, e.g. an estimate of the orientationbased on the alpha-beta distance difference described above. Theselection is carried out on the average, median, mode, or any otherstatistical value of the angles, or the proxy, suitable to determinewhether the side-chains of the pair are likely to be oriented such thatthe cross-link will minimally distort the overall structure.

The angle χ can vary by 360°, and the bond is still possible without anydistortion of the structure, so long as the angles ω and φ adjustcorrespondingly. Therefore, the selection range based on the angle χshould be set by a metric driven by the angles ψ, φ, and χ with a degreeof flexibility similar to that for the angles ψ and φ, or for thealpha-beta distance difference, the range for which is described below.

The range for the angles ψ, φ is, analogous to the optimal range ofalpha carbon distances in Filter 1, optimally between the averages ofthe extreme values calculated for the isoform of the dityrosine pairdepicted in FIG. 2B, and for ω angles of 120° and 120°. This optimalrange is thus between:

-   -   Min: 22.49°, Max: 103.80°.

Since these angles are possible in a larger range even within this oneisoform of the dityrosine bond, and since the above optimal range isoften too restrictive, the minimal angle for either angle ω provides thelower limit and the maximal angle for either angle ω provides the upperlimit of the preferred range. Therefore, the preferred range is:

-   -   Min: 10.5°, Max: 130.5°.

Furthermore, it has been demonstrated in several cases that a proteinstructure can often absorb a certain amount of structural changes, andthat the specificity and functionality is nonetheless maintained. It istherefore also possible, though less preferred, to introduce thereactive side-chains into residue pairs that have angles ψ and φ evenbeyond the preferred range. Given this degree of structural flexibilitythe largest range possible is:

-   -   Min: 0°. Max: 140°.

The optimal range of residue pair alpha carbon distances to be selectedis determined by averaging first the minimal alpha-beta distancedifference in a cross-linked tyrosine pair of the isoform depicted inFIG. 2B, and for ω angles of 120° and 120°, and then, analogously,averaging the maximal alpha-beta distance difference, as calculatedabove. This these calculations result in the following optimal range:

-   -   Min: 0.90 Å, Max: 2.70 Å.

Since distance differences are possible in a larger range, and since theabove optimal range is often too restrictive, the minimal alpha-betadistance difference for either angle t provides the lower limit and themaximal alpha-beta distance difference for either angle ω provides theupper limit of the preferred range. Therefore, the preferred range is:

-   -   Min: −2.00 Å, Max: 3.03 Å.

Furthermore, it has been demonstrated in several cases that a proteinstructure can often absorb a certain amount of structural changes, andthat the specificity and functionality is nonetheless maintained.Furthermore, other isoforms of the dityrosine bond are possible. It istherefore also possible, though less preferred, to introduce thereactive side-chains into residue pairs that have alpha-beta distancedifference even beyond the preferred range. Given this degree ofstructural flexibility the largest range possible is:

-   -   Min: −2.75 Å, Max: 3.08 Å.

4. The flexibility of the side-chains' orientation toward each other ismeasured on the standard deviation or range of the sample, as examples,or any other statistical measure that quantifies the variability of theside-chains of the pairs measured and compiled in a 3-D database. Therange for this selection is preferably set in such a way that theaverage measured alpha-beta distance difference of the selected residuepairs is within less than one standard deviation of the preferred range.However, 2 standard deviations are also possible as a selectioncriterion.

5. Pairs that contain one or both residues that are at least 95% ormore, preferably 80% or more, possibly also 50% or more conserved amongthe domains in the sample are eliminated, as they are likely to beimportant for the overall architecture of the domain, e.g. cysteines inthe formation of di-sulfide bonds, leucines in the formation of leucinezippers, etc.

6. Side-chain physical properties, e.g. charge, hydrophobicity, van derWaals volumes, molecular weight, etc. The selection is carried out onthe average, median, mode, or any other statistical value of theseproperties, individually or combined, suitable to determine whether themutations to tyrosine and the cross-link between a residue pair willminimally distort the overall structure. The degree, to which a residueis conserved, is measured by the standard deviation or range, asexamples, or any other statistical measure of the sample that quantifiesthe variability of the side-chains physical properties which aremeasured and compiled in a 2-D database.

The range can be set, as an example, in the following manner: the valueof a physical property for a tyrosine pair (2× value of tyrosine) iscompared with the combined value of both residues of a pair, and thedifference is obtained by subtraction. The difference is then comparedwith the combined standard deviations of the residue pair. A multiplesmaller than 2 of the combined standard deviations should make up forthe difference between the value of a tyrosine pair and the combinedaverages of the residue pair. However, more direct or intuitivemeasures, as well as more sophisticated and accurate measures, can alsobe used to score and select for physical properties of residue pairs.

7. Elimination of pairs of which one or both residues are at a minimum90% or more, conserved glycines, preferably 60% or more. Glycine is thesmallest of the amino acids and has no beta carbon. Glycine is oftenassociated with turns in protein structures, and substitution of aglycine with one of the largest amino acids, tyrosine, would likely havetoo great an impact on the overall structure.

8. The above structural and/or amino acid side-chain conservation and/orphysical properties of residues/residue pairs proximal to eachresidue/residue pair. Proximity can be determined with regard to boththe polypeptide sequences (2-D) and the overall structure of thepolypeptide or polypeptide complex (3-D).

9. Functional properties concerning the effect of a residue/residue pairon the functionality and/or specificity of the polypeptide orpolypeptide complex.

5.10.1. Incorporation of Data Derived from Modeling

Particularly in embodiments of the instant invention, in which a singlepolypeptide is stabilized, such as, for example, a peptide growth factoror a biocatalyst, any of the known methods in the art may be employed tocalculate and/or compute the effects of the mutations and/or thecross-link on the structure, stability, activity, or specificity of theresultant polypeptide. One example of such a software package is theabove mentioned CNS (Adams P. D. et al. Acta Crystallogr. D. Biol.Crystallogr.; vol. 55 (Pt 1): pp. 181-90, 1999) using the CHARM energyminimization plug-in. Data derived from such analyses may be used tofurther narrow down the selection or residue pairs, and may also be usedto inform the settings of the selection parameters, such as, forexample, the selection ranges.

5.10.2. Minimally Required Filters for Selection

Depending on the nature of the polypeptide or polypeptide complex, andon the availability of data, a subset of filters can, however, sufficeto select a suitable pair for the cross-link reaction. For instance, afilter based on the average of residue alpha carbon spacing (Filter 1,above) can be used alone. It is also possible to make a selection usingthe above filters 6 and 7, both based on the degree to which residuesare conserved, if structural data is available for at least onestructure of such a polypeptide or polypeptide complex. Any one or moreof the above filters, and any combination thereof can be used for theselection.

The order of the filters is not of importance. Furthermore, where itwould add to the quality of the selection, the above filters can besplit in to two or more filters to stress certain aspects of the filter.Filters can additionally be combined by designing metrics that quantifyseveral criteria simultaneously. Thereby, for instance, the selectioncan be refined further by selecting one criterion taking the value ofanother criterion into account.

5.11. DNA Vector Constructs

The nucleotide sequence coding for the polypeptide, or for one, any,both, several or all of the polypeptides of a complex, or functionallyactive analogs or fragments or other derivatives thereof, can beinserted into an appropriate expansion or expression vectors, i.e., avector which contains the necessary elements for the transcriptionalone, or transcription and translation, of the inserted protein-codingsequence(s). The native genes and/or their flanking sequences can alsosupply the necessary transcriptional and/or translational signals.

Expression of a nucleic acid sequence encoding a polypeptide or peptidefragment may be regulated by a second nucleic acid sequence so that thepolypeptide is expressed in a host transformed with the recombinant DNAmolecule. For example, expression of a polypeptide may be controlled byany promoter/enhancer element known in the art.

Promoters which may be used to control gene expression include, asexamples, the SV40 early promoter region, the promoter contained in the3′ long terminal repeat of Rous sarcoma, the herpes thymidine kinasepromoter, the regulatory sequences of the metallothionein gene;prokaryotic expression vectors such as the β-lactamase promoter, or thelac promoter; plant expression vectors comprising the nopalinesynthetase promoter or the cauliflower mosaic virus 35S RNA promoter,and the promoter of the photosynthetic enzyme ribulose biphosphatecarboxylase; promoter elements from yeast or other fungi such as the Gal4 promoter, the alcohol dehydrogenase promoter, phosphoglycerol kinasepromoter, alkaline phosphatase promoter, and the following animaltranscriptional control regions, which exhibit tissue specificity andhave been utilized in transgenic animals: elastase I gene control regionwhich is active in pancreatic acinar cells (Swift et al. Cell; vol. 38:pp. 639-646, 1984); a gene control region which is active in pancreaticbeta cells (Hanahan D., Nature; vol. 315: pp. 115-122, 1985), animmunoglobulin gene control region which is active in lymphoid cells(Grosschedl R. et al. Cell; vol. 38: pp. 647-658, 1984), mouse mammarytumor virus control region which is active in testicular, breast,lymphoid and mast cells (Leder A. et al. Cell; vol. 45: pp. 485-495,1986), albumin gene control region which is active in liver (Pinkert C.A. et al. Genes Dev.; vol. 1: pp. 268-276, 1987), alpha-fetoprotein genecontrol region which is active in liver (Krumlauf R. et al. Mol. Cell.Biol.; vol. 5: pp. 1639-1648, 1985); alpha 1-antitrypsin gene controlregion which is active in the liver (Kelsey G. D. et al. Genes Dev.;vol. 1: pp. 161-171, 1987), beta-globin gene control region which isactive in myeloid cells (Magram J. et al. Nature; vol. 315: pp. 338-340,1985); myelin basic protein gene control region which is active inoligodendrocyte cells in the brain (Readhead C. et al. Cell; vol. 48:pp. 703-712, 1987); myosin light chain-2 gene control region which isactive in skeletal muscle (Shani M. Nature; vol. 314: pp. 283-286,1985), and gonadotropic releasing hormone gene control region which isactive in the hypothalamus (Mason A. J. et al. Science; vol. 234: pp.1372-1378, 1986).

In a specific embodiment, a vector is used that comprises a promoteroperably linked to a gene nucleic acid, one or more origins ofreplication, and, optionally, one or more selectable markers (e.g., anantibiotic resistance gene). In bacteria, the expression system maycomprise the lac-response system for selection of bacteria that containthe vector. Expression constructs can be made, for example, bysubcloning a coding sequence into one the restriction sites of each orany of the pGEX vectors (Pharmacia, Smith D. B. and Jolnson K. S. Gene;vol. 67: pp. 31-40, 1988). This allows for the expression of the proteinproduct.

Vectors containing gene inserts can be identified by three generalapproaches: (a) identification of specific one or several attributes ofthe DNA itself, such as, for example, fragment lengths yielded byrestriction endonuclease treatment, direct sequencing, PCR, or nucleicacid hybridization; (b) presence or absence of “marker” gene functions;and, where the vector is an expression vector, (c) expression ofinserted sequences. In the first approach, the presence of a geneinserted in a vector can be detected, for example, by sequencing, PCR ornucleic acid hybridization using probes comprising sequences that arehomologous to an inserted gene. In the second approach, the recombinantvector/host system can be identified and selected based upon thepresence or absence of certain “marker” gene functions (e.g., thymidinekinase activity, resistance to antibiotics, transformation phenotype,occlusion body formation in baculovirus, etc.) caused by the insertionof a gene in the vector. For example, if the gene is inserted within themarker gene sequence of the vector, recombinants containing the insertan identified by the absence of the marker gene function. In the thirdapproach, recombinant expression vectors can be identified by assayingthe product expressed by the recombinant expression vectors containingthe inserted sequences. Such assays can be based, for example, on thephysical or functional properties of the protein in in vitro assaysystems, for example, binding with anti-protein antibody.

Once a particular recombinant DNA molecule is identified and isolated,several methods known in the art may be used to propagate it. Once asuitable host system and growth conditions are established, recombinantexpression vectors can be propagated and prepared in quantity. Some ofthe expression vectors that can be used include human or animal virusessuch as vaccinia virus or adenovirus; insect viruses such asbaculovirus; yeast vectors; bacteriophage vectors (e.g., lambda phage),and plasmid and cosmid DNA vectors.

Once a recombinant vector that directs the expression of a desiredsequence is identified, the gene product can be analyzed. This isachieved by assays based on the physical or functional properties of theproduct, including radioactive labeling of the product followed byanalysis by gel electrophoresis, immunoassay, etc.

5.12. Systems of Gene Expression and Protein Purification

A variety of host-vector systems may be utilized to express theprotein-coding sequences. These include, as examples, mammalian cellsystems infected with virus (e.g., vaccinia virus, adenoviris, etc.);insect cell systems infected with virus (e.g., baculovirus);microorganisms such as yeast containing yeast vectors, or bacteriatransformed with bacteriophage, DNA, plasmid DNA, or cosmid DNA. Theexpression elements of vectors vary in their strengths andspecificities. Depending on the host-vector system utilized, any one ofa number of suitable transcription and translation elements may be used.

In a specific embodiment, the gene may be expressed in bacteria that areprotease deficient, and that have low constitutive levels and highinduced levels of expression where an expression vector is used that isinducible, for example, by the addition of IPTG to the medium.

In yet another specific embodiment, the polypeptide, or one, any, both,several or all of the polypeptides of a complex may be expressed withsignal peptides, such as, for example, pelB bacterial signal peptide,that directs the protein to the bacterial periplasm (Lei et al. J.Bacterol., vol. 169: pp. 4379, 1987). Alternatively, protein may beallowed to form inclusion bodies, and subsequently be resolubilzed andrefolded (Kim S. H. et al. Mo Immunol, vol. 34: pp. 891, 1997).

In yet another embodiment, a fragment of the polypeptide, or one, any,both, several or all of the polypeptides a complex comprising one ormore domains of the protein is expressed. Any of the methods previouslydescribed for the insertion of DNA fragments into a vector may be usedto construct expression vectors containing a chimeric gene consisting ofappropriate transcriptional/translational control signals and theprotein coding sequences. These methods may include in vitro recombinantDNA and synthetic techniques and in vivo recombinants (geneticrecombination).

In addition, a host cell strain may be chosen that modulates theexpression of the inserted sequences, or modifies and processes the geneproduct in the specific fashion desired. Expression from certainpromoters can be elevated in the presence of certain inducers; thus,expression of the genetically engineered polypeptides may be controlled.Furthermore, different host cells have characteristic and specificmechanisms for the translational and post-translational processing andmodification (e.g., glycosylation, phosphorylation of proteins.Appropriate cell lines or host systems can be chosen to ensure thedesired modification and processing of the foreign polypeptide(s)expressed. For example, expression in a bacterial system can be used toproduce a non-glycosylated core protein product. Expression in yeastwill produce a glycosylated product. Expression in mammalian cells canbe used to ensure “native” glycosylation of a heterologous protein.Furthermore, different vector/host expression systems may effectprocessing reactions to different extents.

In other embodiments of the invention, the polypeptide, or one, any,both, several or all of the polypeptides a complex, and/or fragments,analogs, or derivative(s) thereof may be expressed as a fusion-, orchimeric, protein product (comprising the protein, fragment, analog, orderivative joined via a peptide bond to a heterologous protein sequenceof a different protein). Such a chimeric product can be made by ligatingthe appropriate nucleic acid sequences encoding the desired amino acidsequences to each other by methods known in the art, in the propercoding frame, and expressing the chimeric product by methods commonlyknown in the art. Alternatively, such a chimeric product may be made byprotein synthetic techniques, for example, by use of a peptidesynthesizer.

The polypeptides of a complex may be expressed together in the samecells either on the same vector, driven by the same or independenttranscriptional and/or translational signals, or on separate expressionvectors, for example by cotransfection or cotransformation andselection, for example, may be based on both vectors' individualselection markers. Alternatively, one, any, both, several or all of thepolypeptides a complex may be expressed separately; they may beexpressed in the same expression system, or in different expressionsystems, and may be expressed individually or collectively as fragments,derivatives or analogs of the original polypeptide.

5.13. The Cross-Link Reaction 5.13.1. Introduction of Point Mutations toPhenylalanine

One of the codons of every tyrosine residue pair that may react witheach other and cause undesirable structural and/or functionaldistortions is preferably point mutated to codons that direct theexpression of phenylalanine.

Point mutations can be introduced into the DNA encoding the polypeptide,or one, any, both, several or all of the polypeptides of a complex byany method known in the art, such as oligonucleotide mediatedsite-directed mutagenesis. Such methods may utilize oligonucleotidesthat are homologous to the flanking sequences of such codons, but thatencode tyrosine at the selected site or sites. With theseoligonucleotides, DNA fragments containing the point mutation or pointmutations are amplified and inserted into the gene or genes, forexample, by subcloning. One example of such methods is the applicationof the QuikChange™ Site-Directed Mutagenesis Kit (Strategene, Catalog #200518); this kit uses the Pfu enzyme having non-strand-displacingaction in any double stranded plasmid mutation in PCR reactions. Othermethods may utilize other enzymes such as DNA polymerases, or fragmentsand/or analogs thereof.

The plasmid or plasmids containing the point mutation or point mutationsare, for example, transformed into bacteria for expansion, and the DNAis prepared as described above. The isolated, expanded, and prepared DNAmay be examined to verify that it encodes the polypeptide orpolypeptides of the complex, and that the correct mutation or mutationswere achieved. This may, for example, be verified by direct DNAsequencing, DNA hybridization techniques, or any other method known inthe art.

5.13.2. Purification of Gene Products

The gene product may be isolated and purified by standard methodsincluding chromatography (e.g., ion exchange, affinity, and sizingcolumn chromatography), ammonium sulfate precipitation, centrifugation,differential solubility, or by any other standard technique for thepurification of proteins.

The functional properties may be evaluated using any suitable assay. Theamino acid sequence of the protein can be deduced from the nucleotidesequence of the chimeric gene contained in the recombinant vector. As aresult, the protein can be synthesized by standard chemical methodsknown in the art (e.g., see Hunkapiller M. et al. Nature; vol.310(5973): pp. 105-11, 1984).

5.13.3. The Reaction

The cross-link reaction can utilize any chemical reaction or physicalknown in the art that specifically introduces dityrosine cross-links,such as peroxidase catalysed cross-linking, or photodynamically in thepresence or absence of sensitizers (see Section II). Preferably,however, the reaction is catalyzed by a metallo-ion complex, asdescribed in detail below.

Partially purified polypeptides containing appropriate tyrosine residuesmay be equilibrated by dialysis in a buffer, such as phosphate bufferedsaline (PBS), together or separately before mixing them. The catalyst isthen added (on ice or otherwise). The catalyst of the reaction is anycompound that will result in the above cross-link reaction. The catalystshould have the structural components that convey the specificity of thereaction, generally provided by a structure complexing a metal ion, andthe ability to abstract an electron from the substrate in the presenceof an oxidizing reagent, generally provided by the metal ion. An activemetal is encased in a stable ligand that blocks non-specific binding tochelating sites on protein surfaces. For example, either ametalloporphyrin, such as, but not limited to, 20-tetrakis(4-sulfonateophenyl)-21H,23H-porphine manganese (III) chloride (MnTPPS)or hemin iron (III) protoporphyrin IX chloride (Campbell L. A. et al.Bioorganic and Medicinal Chemistry, vol. 6: pp. 1301-1037, 1998), or ametal ion-peptide complex, such as the tripeptide NH2-Gly-Gly-His-COOHcomplexing Ni++ can serve as the catalyst of the reaction.Metalloporphoryns are a class of oxidative ligand-metal complexes forwhich there are few, if any, high affinity sites in naturally occurringeukaryotic proteins. The reaction can also be catalyzed byintramolecular Ni++ peptide complexes, such as—and C-terminal aminoacids consisting either of 3 or more histidine residues (his-tag), or ofthe above GGH tripeptide. The reaction is initiated by the addition ofthe oxidizing reagent at room temperature or otherwise. Oxidizingreagents include, but are not limited to, hydrogen peroxide, oxone, andmagnesium monoperxyphthalic acid hexahydrate (MMPP) (Brown K. C. et al.Biochem.; vol. 34(14): pp. 4733-4739, 1995). Higher specificity can beachieved by Using a photogenerated oxidant, such as the oxidant used inthe process described by Fancy D. and Thomas Kodadek, which involvesbrief photolysis of tris-bipyridylruthenium(II) dication with visiblelight in the presence of an electron acceptor, such as ammoniumpersulfate (Fancy D. A. and Kodadek T. Proc. Natl. Acad. Sci., U.S.A.;vol. 96: pp. 6020-24, 1999). The optimal reaction period is preferablydetermined for each application; however, in cases where an optimizationprocess is not possible, the reaction should preferably be stopped afterone minute. Using a photogenerated oxidant, such as above described, theexposure to light can be less than one second. The reaction is stoppedby the addition of a sufficient amount of reducing agent, such asb-mercaptoethanol, to counteract and/or neutrolize the oxidizing agent.

Alternatively, the reaction may be stopped by the addition of achelating reagent, such as, for example, EDTA or EGTA. The solution isagain equilibrated by dialysis in a buffer, such as phosphate bufferedsaline (PBS), to remove the reagents required for the cross-linkreaction, such as the oxidizing reagent, the catalyst, or the metal ion,reducing agents, chelating reagents, etc. The cross-link reactionconditions are preferably adjusted such that the polypeptides orpolypeptides of a complex that have been mutated to remove undesirabletyrosyl side-chains no longer form a bond. These conditions are adjustedby varying the reaction temperature, pH, or osmolarity conditions, or byvarying the concentration of the polypeptides, the catalyst, theoxidizing agent, or any other reagents that are applied toward such areaction. The catalyst is a small molecule that diffuses easily, and canbe used at varying concentrations. Tightly packed polypeptidehydrophobic cores have a degree of solvent accessibility. This may bemodulated by any known method in the art, including, but not limited to,by altering the reaction temperature, or by the addition of salts,detergents, deoxycholate, or guanidinium.

5.14. Achieving a Stabilized Polypeptide or Complex 5.14.1. PointMutation to Tyrosine and Gene Product Purification

The codons of the residues identified as a suitable pair to which thecross-link should be directed, as described above, and selected for aparticular embodiment of the instant invention, are point mutated suchthat the resultant residue pairs direct the expression of tyrosylside-chains. Point mutations are introduced as described above.

The gene products are again purified as described above.

5.14.2. Cross-Linking the Polypeptide or Complex

The polypeptides now containing tyrosyl side-chains at the residues towhich the cross-link reaction should be directed are subjected to thecross-link reaction under the conditions determined as described aboveand carried out, also as described above. The efficiency of the reactionmay be examined, for example, by Western blotting experiments, in whicha cross-linked complex should run at approximately the molecular weightof both or all polypeptides of the complex. If, the bond is readilyformed under the above conditions, the strength of the reaction my stillbe further adjusted to the minimally required strength.

In embodiments of the invention wherein the cross-link is directed toresidue pairs that are buried and/or are not readily accessible to thecatalyst or oxidizing reagents, secondary and higher order polypeptidestructure can be temporarily dissociated to permit reagent access. Forexample, such an approach may be necessary when directing the cross-linkto the hydrophobic core of a single polypeptide or to a buried residuepair of polypeptide complex having very high affinity among subunits.Any means known in the art may be used to reversibly denaturepolypeptide structure to permit reagent access to buried residue pairs.Such means include, but are not limited to, manipulating (increasing ordecreasing) salt concentration or reaction temperature, or employingdetergents, or such agents as guanidine HCl. As denaturing conditionsare withdrawn (e.g., by dialysis) and the polypeptide or complex beginsto refold/reassociate, the catalyst and oxidizing reagents may be added,as described above.

5.15. Purification of Cross-Linked Complexes

The cross-linked polypeptide or complex may be isolated and purifiedfrom proteins in the reaction that failed to cross-link, or any otherundesirable side-products, by standard methods including chromatography(e.g., sizing column chromatography, glycerol gradients, affinity),centrifugation, or by any other standard technique for the purificationof proteins. In specific embodiments it may be necessary to separatepolypeptides that were not cross-linked, but that homo- orheterodimerize with other polypeptides due to high affinity binding.Separation may be achieved by any means known in the art, including, forexample, addition of detergent and/or reducing agents.

Yield of functionally cross-linked polypeptides or complexes can bedetermined by any means known in the art, for example, by comparing theamount of stabilized complex, purified as described above, with thestarting material. Protein concentrations are determined by standardprocedures, such as, for example, Bradford or Lowrie protein assays. TheBradford assay is compatible with reducing agents and denaturing agents(Bradford, M. Anal. Biochem.; vol. 72: pp. 248, 1976), the Lowry assayis better compatibility with detergents and the reaction is more linearwith respect to protein concentrations and read-out (Lowry, 0. J. Biol.Chem.; vol. 193: pp. 265, 1951).

5.16. Assay of a Cross-Linked Polypeptide or Complex 5.16.1. RetainedFunction

Functionality

Depending on the nature of the polypeptide or polypeptide complex,retained functionality can be tested, for example, by comparing thefunctionality of the cross-linked complex, cross-linked as describedabove, with that of the polypeptide or complex before stabilization,cross-linked or stabilized by another method, or naturally stabilized bya post-translational modification that, for example, regulates theassociation of certain polypeptides. Assays for retained functionalitycan be based, for example, on the biochemical properties of the proteinin in vitro assay systems. Alternatively, the polypeptide or complex canbe tested for functionality by using biological assay systems. Forexample, the activity of a kinase can be tested in in vitro kinaseassays, and a growth factor, such as a member of the IL-8 family, can betested for activity in chemotactic cell migration assays orbeta-glucuronidase release assays (Leong S. R. et al. Protein Sci.; vol.6(3): pp: 609-17, 1997). As another example, retained enzymatic activityof a biocatalyst can be determined by any method known to one skilled inthe art. The activity of an enzyme is preferably measured directly bycomparing the activity of the enzyme on a substrate before and afterstabilization, and quantitating the product of the reaction. Asexamples, such assays include, but are not limited to, visualizationupon chromatographic separation of the compounds in the reaction,spectrophotometric and fluorometric analyses of reaction products,analysis of incorporated or released detectable markers, such as, forexample, radioactive isotopes. Indirect methods, that include, but arenot limited to, computational, structural, or other thermodynamicanalyses, may also be used for the determination of the activity of thestabilized biocatalyst. More specifically, as an example of abiocatalyst, the activity of a lipase, or specifically the activity ofcarboxylesterases catalyzing the hydrolysis of long-chain acylglycerols,is determined by any method known in the art, including, but not limitedto the measurement of the hydrolysis of p-nitrophenylesters of fattyacids with various chain lengths (>=C-10) in solution byspectrophotometric detection of p-nitrophenol at 410 nm. Where it isnecessary to distinguish between lipases and esterases, the triglyceridederivative 1,2-O-dilauryl-rac-glycero-3-glutaric acid resorufin ester(available from Boehringer Mannheim Roche GmbH, Germany), may also beused as a substrate, yielding resorufin, which can be determinedspectrophotometrically at 572 nm, or fluorometrically at 583 nm (JaegerK-E et al. Annu. Rev. Microbiol. 1999. 53: pp. 315-51).

Specificity

Depending on the nature of the polypeptide or polypeptide complex,retained specificity can be tested, as examples, by comparing thespecificity of the cross-linked polypeptide or complex with that of thepolypeptide or complex before stabilization, cross-linked or stabilizedby another method, or naturally stabilized by a post-translationalmodification. Assays for retained specificity can be based, for example,on enzymatic substrate specificify, or ELISA-type procedures. Forexample, the retained or resultant specificity of a lipase(carboxylesterase) may be determined by any method known to one skilledin the art. Non-limiting examples of such methods include using a numberof fluorogenic alkyldiacylglycerols as substrates for an analysis of thebiocatalyst's stereoselectivity. For a detailed description of suchmethods and of certain such compounds, see the article “New fluorescentglycerolipids for a dual wavelength assay of lipase activity andstereoselectivity” (Zandonella G. et al., 1997, J. Mol. Catal. B: Enzym.3: pp. 127-30).

5.16.2. Stability

In vitro

Stability of the polypeptide or complex may be tested in vitro in, forexample but not limited to, time-course experiments incubating thepolypeptide or complex at varying concentrations and temperatures.Polypeptide or complex stability may also be tested at various pH levelsand under various redox conditions. For all of the above conditions, theremaining levels of functional polypeptides or polypeptide complexes isdetermined by assaying as described above (Functionality). In the aboveexample of a biocatalyst, improved or altered stability of a stabilizedpolypeptide or complex can be determined by any method known to oneskilled in the art. Such methods include, but are not limited to,calorimetric and/or structural analyses, thermodynamic calculations andanalyses, and comparison of the activities of the stabilized andunstabilized enzymes under their optimal conditions and Lindersuboptimal, or adverse reaction conditions, such as higher or lowertemperature, pressure, pH, salt concentration, inhibitory compound, orenzyme and/or substrate concentration. Any of the above analyses mayalso include time course experiments directed to the determination ofstabilized biocatalyst half-life and/or shelf-life. Stabilization of abiocatalyst according to the invention can also be evaluated in thecontext of other methods of biocatalyst stabilization. As non-limitingexamples, the above enzymatic activities can be tested in immobilizinggels or other matrices, or in partial or pure organic solvents.Furthermore, a biocatalyst stabilized by any of the methods known in theart (such as directed evolution or designed mutagenesis, see Background)can also be subjected to the methods of the instant invention to achievefurther stabilization.

In Vivo

Pharmaceutical and therapeutic applications are best tested in vivo orunder conditions that resemble physiological conditions (see also,below). The stability of the polypeptide or complex may be tested in,for example but not limited to, serum, incubating the polypeptide orcomplex in time-course experiments at various temperatures (e.g. 37, 38,39, 40, 42, and 45° C.), and at different serum concentrations, andassaying for the remaining levels of functional polypeptides orcomplexes. Furthermore, stability of a polypeptide or complex in thecytoplasm may be tested in time-course experiments in cell-lysates,lysed under various conditions (e.g. various concentrations of variousdetergents) at different temperatures (e.g. 37, 38, 39, 40, 42, and 45°C.), and assaying for the remaining levels of functional polypeptides orcomplexes. More directly, stability in the cytoplasm may be tested intime-course experiments by scrape-loading tissue culture cells withstabilized polypeptide or complex and assaying for the remaining levelsof function. The stability of the polypeptide or complex may also betested by injecting it into an experimental animal and assaying forspecific activity. Alternatively, the compound may be recovered from theanimal at an appropriate time point, or several time points, and assayedfor activity and stability, as described above.

5.16.3. Biodistribution

To determine the utility of a stabilized polypeptide or polypeptidecomplex more directly, biodistribution and/or other pharmacokineticattributes may be determined. In a specific embodiment, a stabilizedpolypeptide or polypeptide complex may be injected into a model organismand assayed by tracing a marker, such as but not limited to, ¹²⁵I or ¹⁸Fradio labels (Choi C. W. et al. Cancer Research, vol. 55: pp. 5323-5329,1995), and/or by tracing activity as described above (Colcher D. et al.Q. J. Nucl. Med. vol. 44(4): pp. 225-241, 1998). Relevant informationmay be obtained, for example, by determining the amount of functionalpolypeptide or polypeptide complex that can be expected to bepharmaceutically active due to its penetration of the specificallytargeted tissue, such as, for example, a tumor. Half-life in thecirculation and at the specifically targeted tissue, renal clearance,immunogenicity, and speed of penetration may also be determined in thiscontext.

5.16.4. Animal and Clinical Studies

Utility of a stabilized polypeptide or complex can be determineddirectly by measuring its pharmacological activity, either in animialstudies or clinically. In a specific embodiment, such measurements mayinclude, for example, measurements with which tumor pro- or regressionis monitored upon treatment of an animal model or one or severalpatients with a stabilized polypeptide or complex designed as ananti-cancer pharmacological agent. In another embodiment, suchmeasurements may include, for example, measurements, of bone mass, suchas x-ray measurements, upon treatment of an animal model or one orseveral patients with a stabilized polypeptide or complex designed as ananti-menopausal bone-loss pharmacological agent.

5.17. Troubleshooting 5.17.1. Polypeptide or Complex not Cross-Linked

If the polypeptide or polypeptides of a complex should not becomecross-linked and stabilized by the above described reaction, asdetermined, for example, by non-reducing Sodium Dodecyl SulphatePolyacrylamide Gel Electrophoresis (SDS PAGE), there may be severalexplanations and solutions to the problem.

Adjust Polypeptide Concentrations Salt/Osmolarity and/or pH Conditions

For the stabilization of a polypeptide complex, the least problematicexplanation may be that the polypeptides, as they are not yetstabilized, do not form a sufficiently stable complex in solution forthe cross-link to form under the present conditions of the reaction.This could, for example, be determined by immunoprecipitating one of thepolypeptides by any method known in the art, and assaying for thepresence and relative quantity of the other polypeptide(s) of thecomplex in the precipitate, for example, by Western blotting.

Should this be (one of) the problem(s), it may be possible to increasethe strength of the polypeptides' association with each other by anyknown means in the art, including, but not limited to, by adjustingcertain conditions of the reaction, such as, but not limited to, salt,Tris, or protein concentration, or by adjusting the pH of the reaction.If thereby the strength of the polypeptides' association is increased,for example, as determined by non-reducing SDS PAGE, the cross-linkreaction should be tried again under these conditions.

The opposite could also be the problem: the polypeptides of a complex,or the polypeptide structures of a single polypeptide, associate witheach other too tightly, the tyrosyl side-chains are not exposed to thecatalyst or oxidizing reagents, and the dityrosine bond does not form.In such cases, the protein sub- or secondary structures or thepolypeptides of a complex are first dissociated by any means known inthe art, as described above, by adjusting, for example, but not limitedto, the concentrations of salt, detergent, guanidine HCl, and/or anyother agents that cause reversible denaturation, temperature, pressure,and/or reaction time. It may also, for example, be possible to add theoxidizing agent and catalyst at an earlier or later time-point, as theabove conditions are reversed, as described above, and the polypeptideor polypeptide complex begins to refold/reassociate.

Increase Strength of Reaction Conditions

Should the cross-link not form in spite of appropriate polypeptidefolding or good complex formation under the conditions of the reaction,the next solution could be to increase the strength of the conditions ofthe reaction, e.g. by increasing the concentration of the oxidizingreagent and/or of the catalyst. A preferred method would still use theminimal strength of the reaction required for the cross-link to form.

Identify Second-Site Mutation

It may be possible, by screening a library of mutants of the polypeptideor polypeptide complex to be cross-linked, to identify second-sitemutations that alter the fold and/or structure of the polypeptide orpolypeptide complex in such a way, that the cross-link can form. Suchsecond-site mutations may be identified by any methods known in the art,such as, for example, but not limited to, any of the in vitroevolutionary approaches (see above).

Direct Cross-Linking to Reaction to an Alternative Residue Pair

The cross-link may be directed to a pair of tyrosines that cannot becross-linked due to structural elements not captured in the selectionprocess. Should the above approaches not cause the cross-link to formbetween the selected residues of a pair encoding tyrosine under anyconditions, another residue pair may be selected, and the cross-linkreaction tried again, where necessary adjusting the reaction conditions,as described above.

Combined Approach

It may be necessary to employ one, two, any, several, or all of theabove approaches to trouble-shooting to achieve the desired stabilizingdityrosine bond.

5.17.2. Compromised Functionality of Polypeptide or Complex

Decrease Strength of Reaction Conditions

Reducing the strength of the reaction by adjusting, for example, but notlimited to, the concentration of either the catalyst or the oxidizingreagent, the temperature, pressure, and/or reaction time, may result ina stabilized polypeptide or polypeptide complex with better retainedfunctionality.

Adjust Protein Concentrations, Salt/Osmolarity and/or pH Conditions

Non-specific cross-link reactions may compromise the functionality ofthe polypeptide or polypeptide complex, that may occur under certainreaction conditions, such as, but not limited to, high proteinconcentrations relative to the optimum, certain pH levels, or salt,detergent, denaturing, and/or any other concentrations of the componentsin the reaction. These conditions may be adjusted to minimize oreliminate the formation of non-specific, compromising dityrosine bonds.

Identify Second-Site Mutation

It may be possible, by screening a library of mutants of the polypeptideor polypeptide complex to be cross-linked, to identify second-sitemutations that alter the fold and/or structure of the polypeptide orpolypeptide complex in such a way, that the its functionality uponcross-linking is restored. Such second-site mutations may be identifiedby any methods in the art, such as, for example, but not limited to, anyof the in vitro evolutionary approaches (see above).

Direct Cross-Linking Reaction to an Alternative Residue Pair

As often input data for the selection process is less than completelyaccurate, or for any other reason, the selected residue pair may yieldresidue pairs that distort the overall structure of the polypeptide orpolypeptide complex, and thereby compromise or alter its functionality.Should this be the case, another pair that the selection process yieldedshould be mutated such that both residues encode tyrosine, and thecross-link reaction should be tried again, and retained functionalitytested.

Combined Approach

Of course, it may be necessary to employ one or more of the aboveapproaches to trouble-shooting to achieve the desired stabilizingdityrosine bond.

5.18. Software for Selection Process

This invention provides software that permits automated selection ofsuitable residue pairs at which a di-tyrosine bond can be placed. Suchsoftware can be used in accordance with the geometrical, physical, andchemical criteria described above (see especially Identification ofSuitable Residue Pairs for the Reaction), and a Residue Pair SelectionFlowchart such as is set forth in Section 6 below. As described above, asuccessive array of Filters is implemented and residue pairs that “pass”through the filters comprise the selected residue pairs (FIG. 14, leftside). Alternatively, filters can be implemented to process all residuepairs in a parallel array (FIG. 14, right side). Residue pairs that“pass” through a filter define that filter's set of passed pairs. In apreferred embodiment, residue pairs that are in all filters' passed sets(i.e. residue pairs that form the intersection of all filter sets) arethe selected pairs. The filter requirements are as described above(Identification of Suitable Residue Pairs for the Reaction).

5.19. Pharmaceutical Compositions

In one embodiment, this invention provides a pharmaceutical compositioncomprising an effective amount of a stabilized polypeptide orpolypeptide complex, and a pharmaceutically acceptable carrier. As usedherein, “an effective amount” means an amount required to achieve adesired end result. The amount required to achieve the desired endresult will depend on the nature of the disease or disorder beingtreated, and can be determined by standard clinical techniques. Inaddition, in vitro assays may optionally be employed to help identifyoptimal dosage ranges. The precise dose to be employed will also dependon the route of administration and the seriousness of the disease ordisorder, and should be decided according to the judgment of thepractitioner and each subject's circumstances. Effective doses may beextrapolated from dose-response curves derived from in vitro or animalmodel test systems.

Various delivery systems are known and can be used to administer apharmaceutical composition of the present invention. Methods ofintroduction include but are not limited to intradermal, intramuscular,intraperitoneal, intravenous, subcutaneous, intranasal, epidural, andoral routes. The compounds may be administered by any convenient route,for example by infusion or bolus injection, by absorption throughepithelial or mucocutaneous linings (e.g., oral mucosa, rectal andintestinal mucosa, etc.) and may be administered together with otherbiologically active agents. Administration can be systemic or local. Inaddition, it may be desirable to introduce the pharmaceuticalcompositions of the invention into the central nervous system by anysuitable route, including intraventricular and intrathecal injection;intraventricular injection may be facilitated by an intraventricularcatheter, for example, attached to a reservoir, such as an Ommayareservoir. Pulmonary administration can also be employed, e.g., by useof an inhaler or nebulizer, and formulation with an aerosolizing agent.

In a specific embodiment, it may be desirable to administer thepharmaceutical compositions of the invention locally to the area in needof treatment; this may be achieved by, for example, and not by way oflimitation, local infusion during surgery, by injection, by means of acatheter, or by means of an implant, said implant being of a porous,non-porous, or gelatinous material, including membranes, such assialastic membranes, or fibers. In one embodiment, administration can beby direct injection at the site (or former site) of a malignant tumor orneoplastic or pre-neoplastic tissue.

In another embodiment, pharmaceutical compositions of the invention canbe delivered in a controlled release system. In one embodiment, a pumpmay be used (see Langer, supra; Sefton, CRC Crit. Ref. Biomed. Eng.;vol. 14: pp. 201, 1987; Buchwald et al., Surgery; vol. 88: pp. 507,1980; Saudek et al, N. Engl. J. Med.; vol. 321: pp. 574, 1989). Inanother embodiment, polymeric materials can be used (see MedicalApplications of Controlled Release, Langer and Wise (eds.), CRC Pres.,Boca Raton, Fla., 1974; Controlled Drug Bioavailability, Drug ProductDesign and Performance, Smolen and Ball (eds.), Wiley, New York, 1984;Ranger and Peppas, J. Macromol. Sci. Rev. Macromol. Chem.; vol. 23: pp.61, 1983; see also Levy et al. Science; vol. 228: pp. 190, 1985; Duringet al. Ann. Neurol.; vol. 25: pp. 351, 1989; Howard et al. J. Neurosurg;vol. 71: pp. 105, 1989). In yet another embodiment, a controlled releasesystem can be placed in proximity of the therapeutic target, i.e., thebrain, thus requiring only a fraction of the systemic dose (see, e.g.,Goodson, in Medical Applications of Controlled Release, supra, vol. 2,pp. 115-138, 1984).

Other controlled release systems are discussed in the review by Langer(Science; vol. 249: pp. 527-1533, 1990).

In a preferred embodiment, the composition is formulated in accordancewith routine procedures as a pharmaceutical composition adapted forintravenous administration to human beings. Typically, compositions forintravenous administration are solutions in sterile isotonic aqueousbuffer. Where necessary, the composition may also include a solubilizingagent and a local anesthetic such as lidocaine to ease pain at the siteof the injection. Generally, the ingredients are supplied eitherseparately or mixed together in unit dosage form, for example, as a drylyophilized powder or water free concentrate in a hermetically sealedcontainer such as an ampoule or sachette indicating the quantity ofactive agent. Where the composition is to be administered by infusion,it can be dispensed with an infusion bottle containing sterilepharmaceutical grade water or saline. Where the composition isadministered by injection, an ampoule of sterile water for injection orsaline can be provided so that the ingredients may be mixed prior toadministration.

5.20. Considerations for Pharmaceutical Compositions

Stabilized polypeptides or polypeptide complexes of the invention shouldbe administered in a carrier that is pharmaceutically acceptable. Theterm “pharmaceutically acceptable” means approved by a regulatory agencyof the Federal or a state government or listed in the U.S. Pharmacopeiaor other generally recognized pharmacopeia or receiving specific orindividual approval from one or more generally recognized regulatoryagencies for use in animals, and more particularly in humans. The term“carrier” refers to a diluent, adjuvant, excipient, or vehicle withwhich the therapeutic is administered. Such pharmaceutical carriers canbe sterile liquids, such as water, organic solvents, such as certainalcohols, and oils, including those of petroleum, animal, vegetable orsynthetic origin, such as peanut oil, soybean oil, mineral oil, sesameoil and the like. Buffered saline is a preferred carrier when thepharmaceutical composition is administered intravenously. Salinesolutions and aqueous dextrose and glycerol solutions can also beemployed as liquid carriers, particularly for injectable solutions. Thecomposition, if desired, can also contain minor amounts of wetting oremulsifying agents, or pH buffering agents. These compositions can takethe form of solutions, suspensions, emulsion and the like. Examples ofsuitable pharmaceutical carriers are described in “Remington'sPharmaceutical Sciences” by E. W. Martin. Such compositions will containa therapeutically effective amount of the Therapeutic, preferably inpurified form, together with a suitable amount of carrier so as toprovide the form for proper administration to the patient. Theformulation should suit the mode of administration. In a preferredembodiment, the composition is formulated in accordance with routineprocedures as a pharmaceutical composition adapted for intravenousadministration to human beings. Typically, compositions for intravenousadministration are solutions in sterile isotonic aqueous buffer.

6. EXAMPLE I Stabilized Fv Fragments

The following example illustrates certain variations of the methods ofthe invention for protein and protein complex stabilization. Thisexample is presented by way of illustration and not by way of limitationto the scope of the invention.

6.1. Introduction

Several polypeptides and polypeptide complexes with significantcommercial value have been identified in recent years, and furthermore,several modular domains have been identified that mediateprotein-protein interactions: For many of these domains, the interactionsites with other proteins have also been mapped.

In the following section, methods of stabilizing one such complex, an Fvfragment complex, for which an abundance of data is available, aredescribed in detail. Specifically, described below are the assembly ofrelevant databases for the selection process, the selection processitself, the introduction of point mutations, bacterial expression of thepolypeptides and their purification, adjustment of the cross-linkreaction conditions, the cross-link reaction itself, and analysis of theresulting stabilized complex.

The input data for the 2-D database is obtained from Weir's Handbook ofExperimental Immunology I. Immunochemistry and Molecular Immunology,Fifth Edition. The input data for the 3-D database is obtained from theBrookhaven National Laboratory Protein Database. The derivative datarelevant to the selection process in both databases is calculated asdescribed. The selection process is carried out using a set of filtersthat is convenient and appropriate for this application of the instantinvention.

Point mutations to tyrosine (directing the cross-link reaction) areintroduced according to the final selection of the selection process,and point mutations to phenylalanine (limiting the cross-link reaction)according to the specific sequence of each Fv fragment and thecorresponding and relevant structural information contained in the 3-Ddatabase. The polypeptides of the complex are expressed bacterially asGST fusion proteins, and purified over a GT-affinity column. Thepurified polypeptides of the complex are proteolytically cleaved fromthe GST parts of the fusion proteins, and the GST polypeptide isremoved, again using a GT affinity column.

The minimally required reaction conditions are adjusted using aconstruct with the mutations to phenylalanine, but lacking the mutationsto tyrosine, and the cross-link reaction is then carried out with theconstructs containing both sets of point mutations. The efficiency ofthe reaction is tested for, and the resulting, stabilized Fv fragmentsare then tested for retained affinity, stability, immunogenicity, andbiodistribution characteristics.

6.2. Advantages of the Tyrosyl-Tyrosyl Cross-Link for Fv Fragments

The underlying chemistry of the technology covered by the presentinvention causes an oxidative cross-link to form between reactiveside-chains of proteins that form stable complexes. Because thecross-linking reaction is catalyzed, once established, the cross-link isstable in the absence of the catalyst under a broad range of pH andredox conditions. The cross-link reaction requires very close proximitybetween the molecules that will cross-link and therefore only occursbetween molecules that normally interact and associate closely insolution and is therefore limited to molecules that have legitimatefunctional interactions.

Thus, the current invention describes a new technology that will allowstabilization of immunoglobulin-derived conjugates and result in both avery high degree of stability and minimal immunogenicity in therapeuticcontexts. This technology is designed to improve on preceding, andcomplement compatible, technologies.

The resultant stabilized Fv fragments will have the followingcharacteristics:

-   -   1. The conjugates will be stable under a broad range of pH and        redox conditions and at high protein concentrations.    -   2. The resultant cross-linked complex will be minimally        immunogenic since no exposed residues are altered.

This Fv fragment stabilization technology is well suited for thedevelopment of new products with novel applications, the improvement ofexisting immunoglobulin-based products, and the complementation ofexisting technologies for the development of novel immunoglobulinapplications.

6.3. Fv Fragment Applications

There is a wide spectrum of potential applications forimmunoglobulin-based products, the limits of which are determined by thefollowing factors:

The target must be in an environment that is accessible toimmunoglobulin-derived products, such as, for example, serum, theextracellular matix, the brain, or the intracellular space by way ofliposomes (Hoffman R. M. J. Drug Target.; vol. 5(2): pp. 67-74, 1998) orpeptide induced cellular uptake (Schwarze S. R. et al. Science; vol.285: pp. 1565-72, 1999). For intracellular applications ofimmunoglobulin, see Bosilevac J. M. et al. J. Biol. Chem.; vol. 273(27):pp. 16874-79, 1998; Graus-Porta D. et al. Mol. Cell Biol.; vol 15: pp.1182-91, 1995; Richardson J. H. et al. Proc. Nat. Acad. Sci., USA; vol.92: pp. 3137-41, 1995; Maciejewski J. P. et al. Nat. Med.; vol. 1: pp.667-73, 1995; Marasco W. A. et al. Proc. Nat. Acad. Sci., USA; vol. 90:pp. 7889-93, 1993; Levy Mintz P. et al. J. Virol.; vol. 70: pp. 8821-32,1996; Duan L. et al. Hum. Gene Ther.; vol. 6(12): pp. 1561-73, 1995; andKim S. H. et al. Mol. Immunol.; vol. 34(12-13): pp. 891-906, 1997. Afavorable environment is present in all tissues and organs that arereached by the blood supply, and where the target molecule is present onthe cell surface or in the extra-cellular matrix. Since thefunctionality of immunoglobulin-derived Fv fragments is primarily tobind to target molecules, binding to the target should preferablysuffice to accomplish the desired therapeutic or diagnostic effect.Catalytic functionality is, however, also known for immunoglobulin, andmay therefore also be achieved in pharmacological and/or industrialcontexts (Pluckthun A. et al. Ciba Found. Symp.; vol. 159: pp. 103-12;discussion 112-7, 1991; Kim S. H. et al. Mol. Immunol, vol. 34: pp.891-906, 1997).

There is a multitude of applications of potential immunoglobulin-basedapplications that meet these criteria, and it is the purpose of thefollowing paragraphs only to point out certain relevant applications, asexamples.

6.3.1. Drug Delivery/Tissue Targeting

Many existing applications of immunoglobulin therapy make use ofantibody's ability to direct therapeutic agents to the targeted tissues.Such therapeutic agents have thus far been toxins and radioisotopestargeted to tumors by linkage to anti-tumor associated antigen oranti-tumor specific antibodies, on the one hand, and diagnostic agents,i.e. antibodies linked to an imaging agent, on the other hand.

6.3.2. Modulation of Extra-Cellular Biochemical Processes

There are a multitude of biochemical processes that are of therapeutic,and thus of commercial relevance that occur in extra-cellular milieus,such as blood serum. One example of such a process is the process ofblood clotting. In this example, the immunoglobulin binds to one of theproteins involved in the biochemical cascade of reactions that lead tothe formation of blood clots, and interrupts this cascade, therebyblocking the formation of blood clots. The therapeutic value of beingable to inhibit the formation of blood clots, indeed, spurred thedevelopment of one of the first immunoglobulin-based pharmaceutical toenter the market.

6.4. Selection of Optimal Residues for Tyrosyl-Tyrosyl Cross-Link

The selection process consisted of a series of statistical tests or‘filters’ aimed at successively narrowing down the residue pairs mostlikely to result in a cross-linked heavy chain-light chain tyrosine pairthat minimally alter the Fv fragment's’ structural characteristics.

6.4.1. Data Used for the Analysis

Residue amino acid usage data is data compiled on amino acids encodedand expressed at each residue of known and sequenced Fv fragments. It iscollected in, and D obtained from, the publication “Proteins ofImmunological Interest”, Kabat and Wu, Government Printing Office, NIHPublication 91-3242, 1991 (“K&W”). The amino acid sequences in thispublication are ordered according to a standardized numbering systemthat takes into account the gene structure of the heavy and light chainvariable regions. In the variable regions of the heavy and light chainsalike, four Framework Region segments (FRs)—which are relativelyconserved—are interspersed by three-highly variable—ComplementarityDetermining Regions (CDRs). The CDRs contain the amino acids thatdetermine the antibody's specificity, and that physically contact theantigen. Aligning all sequences according to the K&W numbering systemwas very important for the purpose of performing a statistical analysisas described in this example since the corresponding 2 residues of theFRs are thereby always aligned, regardless of the varying sequencelengths of the interspersed CDRs. This ensured that statisticalmeasurements were made with sets of data containing appropriate andcomparable data points. Coordinate data for distance calculations of allatoms other than hydrogens of 17 Fv fragments from crystallographicallysolved immunoglobulin structures was downloaded from the proteinstructure database Brookhaven National Laboratory (www.bnl.pdb.gov; FIG.5). These data provide the three-dimensional coordinates (x, y, and z)for each atom in a solved structure, expressed in metric units, i.e.Angstroms (10-10 m, Å). With this data it was possible to calculate thethree-dimensional distances between any desired atoms (e.g. amino alphaand beta carbon atoms) and to calculate statistical measurements of thevariability of such distance between the different Fv fragments in thesample being analyzed (FIGS. 5, 6, and 7).

6.4.2. Selection Methodology

Optimal residues, to which the cross-link reaction is directed, wereselected by a series of filters based on the statistical measurements ofvalues in databases compiled 3 for the purposes of this selection. Thesedatabases contain numeric measurements of (1) alpha carbon spacing, (2)beta carbon spacing and the difference between the alpha and betadistances, and (3) residue amino acid usage (see below).

6.5. Filter 1: Elimination of Residue Pairs with Glycines

Glycine is the smallest of the amino acids and has no beta carbon and isoften associated with positional flexibility of protein structures.Substitution of a glycine with one of the largest amino acids, tyrosine,would likely have too great an impact on the overall structure of theprotein complex, and thereby on the antigen-binding characteristics ofthe cross-linked Fv fragment. Therefore, as a first cut, from among allcandidate residue pairs of the Framework Regions, those pairs, of whichone of the residues is most frequently a glycine (as determined bycomparison with the K&W data) were eliminated a priori. For the purposesof this analysis ‘most frequent’ occurrence of a particular amino acidat a given residue was defined as occurrence in more than 75% of thesample. TABLE 1 Heavy chain-light chain candidate pairs with averagealpha carbon distance measurements mx, within the range of 5.70 Å to11.74 Å (sorted by K&W numbering, first on the light chain, second onheavy chain positions). Light Heavy AVERAGE STDEV 36 45 10.38 0.23 36103 10.99 0.31 37 45 11.49 0.36 38 39 11.49 0.18 38 45 10.17 0.43 38 10311.26 0.41 40 41 11.27 1.50 40 43 11.68 1.34 42 39 11.04 0.84 42 8910.28 0.99 42 90 11.72 0.88 42 91 10.5 0.66 42 103 10.13 0.34 42 1057.14 0.40 42 107 11.18 0.82 43 4 11.50 0.56 43 37 10.94 0.87 43 38 10.970.98 43 39 10.34 0.79 43 45 10.78 0.71 43 89 9.95 0.71 43 90 10.23 0.7243 91 8.04 0.71 43 92 10.21 0.59 43 93 10.14 0.65 43 103 6.74 0.51 43105 5.74 0.44 43 107 10.66 0.62 44 37 10.58 0.39 44 38 11.31 0.50 44 3910.73 0.48 44 45 9.43 0.48 44 91 9.33 0.33 44 92 10.91 0.40 44 93 9.740.29 44 103 6.92 0.30 44 105 8.95 0.55 45 93 10.43 0.41 45 103 7.40 0.4145 105 10.95 0.45 46 93 10.78 0.40 46 94 11.19 0.25 46 103 8.98 0.33 8543 11.04 0.49 85 45 10.93 0.37 86 45 10.63 0.35 87 43 11.64 0.32 87 458.19 0.25 87 46 10.90 0.33 88 45 10.04 0.10 88 46 11.69 0.21 98 37 10.240.31 98 38 11.25 0.25 98 39 11.17 0.20 98 43 11.60 0.39 98 45 6.49 0.1898 46 6.66 0.29 98 48 7.65 0.57 98 49 11.37 0.58 100 39 11.42 0.29 10043 8.27 0.41 100 45 7.82 0.27 100 46 9.56 0.46 102 43 11.47 0.36

6.6. Filter 2: Identification of Appropriately Spaced Residue Pairs

To find residue pairs spaced appropriately for a tyrosyl-tyrosyl bond,the alpha carbon to alpha carbon distances from every residue in thelight chain to every residue in the heavy chain in Fv fragmentsrepresented in the Brookhaven National Protein Structure Database werecalculated in a 3D database. This calculation was performed by applyingPythagorean geometry to the 3D coordinates of the alpha carbons (FIG.6). For every combination of heavy and light chain residues, theaverage, standard deviation, range and median of the alpha carbon-alphacarbon distance was calculated on the Fv fragments in the sample (FIG.7). Based on the calculations above, as a second cut, all residue pairswere selected whose alpha carbons are spaced at an average, m, withinthe selection range. The range that was selected for was the following:

-   -   Min 5.70 Å, Max 11.74 Å.

The optimal distance (T) was calculated by averaging the maximum and theminimum of the range. Therefore,T=(5.70 Å+11.74 Å)/2=8.72 Å.

In this example, 64 residue pairs met this criterion, listed in Table 1.

6.7. Filter 3: Identification of Residue Pairs with SufficientPositional Flexibility

In order to identify residue pairs at which substitution to tyrosine isminimally disruptive, residues pairs with significant positionalflexibility were selected. Therefore, residue pairs were eliminated fromamong those in Table 1 in which the optimal distance, 8.72 Å, does notfall within 2 times of that specific residue pair's standard deviationfrom its average. In this example, 36 residue pairs met this criterion.

Furthermore, the relative positional flexibility of the remaining 12candidate residue pairs was rated according to the following formula:Rating I=a _(x) ²/σ_(x).

-   -   a_(x)=T−μ_(x)+2σ_(x), for all μ_(x)≧T    -   a_(x)=μ_(x)+2σ_(x)−T, for all μ_(x)≧T    -   T=optimal distance    -   μ_(x)=the average distance for any given residue pair    -   σ_(x)=standard-deviation of the distance for any given residue        pair

Thus, residues that scored highly under this metric are those that (i)have an average spacing close to the optimal distance, and/or (ii) havea large standard deviation. The remaining 12 residue pairs are listed,sorted by Rating I in Table 2. TABLE 2 Residue pairs of Table 1selected¹ and rated by Rating I². Heavy Light Rating I AVG STDEV 44 1051.35 8.95 0.55 43 91 0.76 8.04 0.71 46 103 0.49 8.98 0.33 100 43 0.338.27 0.41 43 37 0.26 10.9 0.87 42 89 0.17 10.3 0.99 40 41 0.14 11.3 1.5044 45 0.13 9.43 0.48 43 89 0.06 9.95 0.71 100 46 0.01 9.56 0.46 98 480.01 7.56 0.57 44 91 0.01 9.33 0.33¹Selection criterion: optimal distance (T) must fall within the range ofthe residue pair's specific distance average (μ_(x)) +/− 2 times theresidue pair's specific standard deviation (σ_(x)).²Rating I formula: α_(x) ²/σ_(x), where T is the optimal distance, andα_(x) = T − μ_(x) + 2σ_(x), for all μ_(x) ≧ T,and a_(x) = μ_(x) + 2σ_(x) − T, for all μ_(x) ≦ T.

6.8. Filter 4: Side-Chain Orientation

In the space that the heavy and light chains occupy, the tyrosine sidechains should be oriented toward each other for a cross-link to formwith minimal structural distortion. The difference between the alphacarbon distance (i.e. the backbone carbon distance; FIG. 6) and the betacarbon distance (i.e. the distance between the first carbons in eachside chain; FIG. 8) of each residue pair was calculated as a proxy, i.e.an estimate or the orientation of the side chains relative to each other(FIG. 9).

The range that was selected for was the following:

-   -   Min −0.5 Å, Max 2.0 Å.

Th optimal distance difference (D) was calculated by averaging themaximum and th minimum of the range. Therefore,D=(−0.5 Å÷2.0 Å)/2=0.75 Å.

Again, based on 3D coordinate geometry, for each residue pair, thedistance between the beta carbons was calculated (FIG. 8). The betadistance was then subtracted from the alpha distance of the residue pair(FIG. 9). This filter was based on whether the average difference in thealpha and beta distances of a residue pair (FIGS. 10 and 11) fallswithin the estimated optimal range. In this example, 12 residue pairsmet this criterion, listed in Table 3. TABLE 3 Residue pairs of Table 2selected by average alpha-beta distance difference. Heavy Light Rating IAVG STDEV AVG STDEV 91 43 0.76 8.04 0.71 1.33 0.70 45 43 0.56 10.78 0.71−0.04 0.31 103 46 0.49 8.98 0.33 0.81 0.18 39 42 0.48 11.04 0.84 0.210.14 91 42 0.30 10.5 0.66 −0.14 0.17 37 43 0.26 10.94 0.87 0.81 0.59 8942 0.17 10.28 0.99 0.01 0.06 92 43 0.15 10.21 0.59 −0.23 0.61 89 43 0.069.95 0.71 0.71 0.36 93 43 0.02 10.14 0.65 1.07 0.73 48 98 0.01 7.65 0.570.87 0.17 30 43 0.00 10.34 0.79 0.41 0.28Furthermore, analogously to the selection based on alpha carbondistances, those pairs were eliminated for which the optimal averagedistance difference, 0.75 Å, does not fall within 2 times that residuepair's specific standard deviation from its average.Rating II=a _(x) ²/σ_(x)

-   -   a_(x)=D−u_(x)+2σ_(x), for all μ_(x)≧D    -   a_(x)=u_(x)+2σ_(x)−D, for all μ_(x)≧D    -   D=optimal distances difference    -   μ_(x)=the average distance difference for anus given residue        pair    -   σ_(x)=standard deviation of the distance difference for any        given residue pair

Of the set of potential residue pairs listed in Table 4, five pairs metthese criteria. This set of potential residue pairs is listed in Table5. TABLE 4 Residue pairs of Table 5 selected¹ and rated according toRating II² Difference between C-alpha and C-beta distances Alpha Carbondistance Heavy Light Rating II Average Stdev Rating I Average Stdev 9243 0.10 −0.23 0.61 0.15 10.21 0.59 39 43 0.17 0.41 0.28 0.00 10.34 0.7948 98 0.30 0.87 0.17 0.01 7.65 0.57 103 46 0.49 0.81 0.18 0.49 8.98 0.3391 43 0.96 1.33 0.70 0.76 8.04 0.71 89 43 1.27 0.71 0.36 0.06 9.95 0.7193 43 1.79 1.07 0.73 0.02 10.14 0.65 37 43 2.10 0.81 0.59 0.26 10.940.87¹Selection criterion: Optimal difference in alpha and beta distances (D)must fall within the range of the residue pair's average alpha-betadistance-difference (δ_(x)) 2 × the residue pair's specific standarddeviation (σ_(x)).²Rating II formula: α_(x) ²/σ_(x), whereby D is the optimal distancedifference, and α_(x) = D − δ_(x) + 2σ_(x), for all δ_(x) ≧ D, and α_(x)= δ_(x) + 2σ_(x) − D, for all δ_(x) ≦ D.

Note that optimal alpha-alpha distance and alpha-beta distancedifference (Target) also falls comfortably within the range of actuallymeasured values of most of the residue pairs selected, as shown in Table5. This is important, because it further underscores the likelihood thatthe selected candidate pairs will result in cross-liked tyrosine sidechains that minimally disrupt the Fv fragment structure and function.TABLE 5 Average, median, standard deviation, and range of actuallymeasured alpha—alpha distances and alpha-beta distance differences. Theremaining residue pairs are identified in the top two rows by theirheavy and light chain K&W residue numbers. Heavy 37 39 89 91 92 93 10348 Light 43 43 43 43 43 43 46 98 Average 10.94 10.34 9.95 8.04 10.2110.14 8.98 7.65 Stdev 0.87 0.79 0.71 0.71 0.59 0.65 0.33 0.57 AlphaCarbon Max 13.23 12.37 11.75 9.82 11.81 11.81 9.63 8.68 Distance Min9.94 9.63 9.05 7.32 9.56 9.42 8.39 6.78 Median 10.81 10.10 9.80 7.929.99 9.95 8.95 7.89 Average 0.81 0.41 0.71 1.33 −0.23 1.07 0.81 0.87Stdev 0.59 0.28 0.36 0.70 0.61 0.73 0.18 0.17 Ca-Cb Max 1.42 0.84 1.172.02 0.33 1.74 1.09 1.37 Difference Min −0.64 −0.10 −0.08 −0.25 −1.86−0.69 0.40 0.63 Median 1.03 0.45 0.75 1.65 0.05 1.29 0.77 0.81

6.9. Filter 5: Amino Acid Side-Chain Usage

Since residue pairs are to be substituted with tyrosine such that thesubstitutions are minimally disruptive to the structure and function ofthe resulting cross-linked complex, residue pairs were selected fromamong those in Tables 4 and 5 such that the properties of the originalamino acid side-chains were as similar as possible to those of tyrosine.The principal side chain properties that were measured are (i) van derWaals volume and (ii) hydrophobicity. These measurements were used asproxies for the size and charge of the amino acid side chains,respectively.

At each residue, every occurring amino acid side chain was given anumeric value representing its van der Waals volume and itshydrophobicity (FIG. 12). Based on amino acid usage data for theseresidues (Kabat & Wu), the average and standard deviation of theresidue's van der Waals volume and hydrophobicity were calculated, bothweighted, and un-weighted by the frequency at which the specific sidechain occurs at this residue. A weighted statistical measurement iscalculated on every value present in the sample (n=number of sequencesin 2-D database), and an un-weighted statistical measurement iscalculated on the value of each occurring amino acid (n=20 maximally)(FIG. 13).

For example, given 10 sequences in a database, whereby at a givenresidue alanine occurs 8 times, and leucine twice, the weighted averageof the van der Waals volumes would be:(8×ala value+2×leu value)/10=(8×67+2×124)/10=78.4.

In the same example, the un-weighted average would be(ala value+leu value)/2=(67+124)/2=95.5.

The numeric values of all 20 amino acids of both van der Waals volumeand hydrophobicity used for the selection are listed in Table 6.

Each of the 6 residue pairs identified in the structural analysis wasexamined for its ability to be “conservatively” Substituted with twotyrosine residues, by comparing the pair's average van der Waals andhydrophobicity scores and their standard deviations with those of atyrosine pair. TABLE 6 Numeric values of amino acid side chain van derWaals volumes (Richards, F. M._J. Mol. Biol. 82, 1-14, 1974) andhydrophobicity (Eisenberg, D._Ann. Rev. Biochem. 53, 595-623, 1984). Vander Walls Amino Acid volumes [A³] Hydrophobicity Ala 67 0.62 Arg 148−2.50 Asn 96 −0.78 Asp 91 −0.90 Cys 86 0.29 Gln 114 −0.85 Glu 109 −0.79Gly 48 0.48 His 118 −0.40 Ile 124 1.40 Leu 124 1.10 Lys 135 −1.50 Met124 0.64 Phe 135 1.20 Pro 90 0.12 Ser 73 −0.18 Thr 93 −0.05 Trp 163 0.81Tyr 141 0.26 Val 105 1.10

For each of the residues listed in Table 5, the average van der Waalsvolumes and hydrophobicity values and their standard deviations,weighted and unweighted, are listed in Table 7 and 8, respectively.TABLE 7 Van der Waals scores for residue pairs and comparison to atyr-tyr pair. Heavy 37 39 89 91 92 93 103 48 Consensus VAL GLN VAL TYRCYS ALA TRP VAL Average 109 113 110 141 86 69 160 110 Stdev 8 12 12 1 —9 11 9 unweighted Average 116 103 122 138 86 78 136 116 Stdev 10 51 18 4— 26 27 10 Light 43 43 43 43 43 43 46 98 Consensus ALA ALA ALA ALA ALAALA LEU PHE weighted Average 72 72 72 72 72 72 124 135 Stdev 14 14 14 1414 14 3 2 unweighted Average 94 94 94 94 94 94 118 128 Stdev 24 24 24 2424 24 11 6 Heavy 37 39 89 91 92 93 103 48 Light 43 43 43 43 43 43 46 982 × tyr value 282 282 282 282 282 282 282 282 Comb. value¹ 181 185 182213 158 141 283 245 weighted Difference² 101 97 100 69 124 141 1 38Comb. Stdev.³ 22 26 26 15 14 23 14 11 Rating III⁴ 0.21 0.27 0.26 0.210.11 0.16 10.39 0.28 2 × tyr value 282 282 282 282 282 282 282 282 Comb.value¹ 210 197 216 232 180 172 253 244 unweighted Difference² 72 85 6650 102 110 29 39 Comb. Stdev.³ 35 75 43 29 24 50 38 17 Rating IV⁴ 0.490.89 0.64 0.57 0.24 0.46 1.32 0.43¹Sum of the residue pair's average van der Waals values²Size of the difference (square root of squared difference) between thesum of the value for two tyrosine residues (282) and the sum of theresidue pairs' average values (¹)³Sum of both residue's standard deviation⁴Formula used: Stdev/Difference (³/²)

TABLE 8 Hydrophobicity scores for residue pairs and comparison to atyr-tyr pair. Heavy 37 39 89 91 92 93 103 48 Consensus VAL GLN VAL TYRCYS ALA TRP VAL Weighted Average 1.14 −0.86 0.90 0.30 0.29 0.58 0.791.14 Stdev 0.14 0.35 0.66 0.20 — 0.19 0.30 0.11 Unweighted Average 1.07−0.96 0.41 0.73 0.29 0.54 0.41 1.25 Stdev 0.27 1.49 1.37 0.66 — 0.471.05 0.17 Light 43 43 43 43 43 43 46 98 Consensus ALA ALA ALA ALA ALAALA LEU PHE Weighted Average 0.50 0.50 0.50 0.50 0.50 0.50 1.08 1.20Stdev 0.33 0.33 0.33 0.33 0.33 0.33 0.09 0.03 Unweighted Average 0.470.47 0.47 0.47 0.47 0.47 0.95 1.23 Stdev 0.59 0.59 0.59 0.59 0.59 0.590.27 0.15 Heavy 37 39 89 91 92 93 103 48 Light 43 43 43 43 43 43 46 98 2× tyr value 0.52 0.52 0.52 0.52 0.52 0.52 0.52 2.34 Comb. value¹ 1.64−0.36 1.40 0.80 0.79 1.08 1.87 1.82 Weighted Difference² 1.12 0.88 0.880.28 0.27 0.56 1.35 0.13 Comb. Stdev.³ 0.46 0.69 1.00 0.53 0.33 0.530.38 0.07 Rating V 0.42 0.78 1.13 1.89 1.24 0.97 0.28 0.06 2 × tyr value0.52 0.52 0.52 0.52 0.52 0.52 0.52 0.52 Comb. value¹ 1.54 −0.49 0.881.20 0.76 1.01 1.35 2.48 Unweighted Difference² 1.02 1.01 0.36 0.68 0.240.49 0.83 1.96 Comb. Stdev.³ 0.87 2.09 1.97 1.26 0.59 1.07 1.32 0.33Rating IV⁴ 0.85 2.07 5.44 1.86 2.49 2.20 1.58 0.17¹Sum of the residue pair's average hydrophobicity values²Size of the difference (square root of squared difference) between thesum of the value for two tyrosine residues (0.52) and the sum of theresidue pairs' average values(¹)³Sum of both residue's standard deviation⁴Formula used: Stdev/Difference (³/²)

6.10. Filter 6: Partial Elimination of Pairs with Highly ConservedResidues

All residues under consideration are within the Framework Regions ofeither the heavy or the light chain of Fv fragments, and can thereforebe expected to be conserved. Therefore, for the purpose of thisanalysis, residues that are more than 80% conserved (see Table 9) areeliminated, with the exception of pairs in which an aromatic amino acidis conserved (see below). TABLE 9 Residue amino acid identityconservation Occurrence Sample No. Consen- of size, occurring AAidentity sus¹ consensus² N³ AAs⁴ conservation⁵ Heavy Chain 37 VAL 31 404 78% 39 GLN 35 37 3 95% 48 VAL 30 42 4 71% 89 VAL 25 40 7 63% 91 TYR 4244 2 95% 92 CYS 44 44 1 100%  93 ALA 37 42 4 88% 103  TRP 30 33 3 91%Light Chain 43 ALA 49 65 6 75% 46 LEU 54 57 3 95% 98 PHE 66 68 3 97%¹Most frequently occurring amino acid the indicated residue²Number of the consensus amino acid(¹) occurrences at the indicatedresidue³Number of amino acids known for an Fv fragment at the indicated residue⁴Number of different amino acids(AAs) occurring at the indicated residue⁵Occurrence of the consensus amino acid(²) divided by the sample size,N(³).

Of the residues of the residue pairs of tables 4, 5, 6, 8, and 9, fourpairs either do not contain a conserved aromatic amino acid, or docontain a residue that is more than 80% conserved, and are thereforeeliminated.

The remaining residue pairs, that are predicted to be the optimalpositions for the cross-link, are listed in Table 10 with all ratingsdescribed above. TABLE 10 Selected potential residue pairs for thetyr-tyr cross-link to be directed to. Residue pairs (H/L) Rating IRating II Rating III/IV Rating V/VI 103/46  0.49 0.49 10.39/1.32 0.28/1.58 89/43 0.06 1.27 0.26/0.64 1.13/5.44 37/43 0.26 2.10 0.21/0.490.42/0.85 48/98 0.01 0.30 0.28/0.43 0.06/0.17

6.11. Residue Pair Selection Flowchart for Software Database Assembly

Starting Material

2-D Database Import and Sorting of Data

Sequence Data

Import of 2D-polypeptide sequence data.

Define:

-   -   s=sample size (number) of sequences of the individual        polypeptide chains of the protein complex (preferably in        polypeptide pairs of a complexes)

Alignment of data according to functional conservation (e.g. Kabat & Wunumbering system for Ig).

Define:

-   -   i (subscript)=amino acid position within the alignment system to        which any given atom belongs

Compilation of identity (three letter code) and frequency of amino acidsoccurring at each residue.

Define:

-   -   f_(i)=frequency of the occurrence of a particular amino acid at        a given residue, i    -   n_(i)=number of amino acids occurring at a given residue, i

Define and mark residues of both polypeptides within the conservedregions of both polypeptides (Framework Regions for Fv fragments).

Assign:

-   -   con=conserved residues    -   non=variable residues

Assignment of consensus.

Define:

-   -   The consensus is the most frequently occurring amino acid at any        given residue of either polypeptide.

Assign:

-   -   For each residue, i,    -   Assign the consensus using, for example, amino acid        single-letter code. For residues at which two or more amino        acids occur most frequently, assign all most frequently        occurring amino acids.

Data on Physical Properties of Amino Acid Side-Chains

Compilation of look-up tables with amino acids and corresponding numericvalues Numeric values correspond to the most relevant physicalproperties of amino acid side-chains as they influence the overallstructure of polypeptide complexes (e.g. side-chain volume, charge,hydrophobicity, and degrees of rotational freedom, etc.)

Define:

-   -   p (subscript): amino acid side-chain physical property chosen        for the selection process    -   N_(pi)=numeric value of a physical property corresponding to an        occurring amino acid at a given residue, i        3-D Database Import and Sorting of Data

Sorting by Sequence (2-D)

Import of 3D-ordinate data of the polypeptides (from the structure ofthe complex as a whole).

Define:

-   -   m (subscript)=sample size (number) of different structures file        imported (for both polypeptides of a complex)

Alignment of data according to functional conservation (e.g. Kabat &; Wunumbering system for Ig)

Sorting by Atomic, 3-D Position

Sorting of coordinate data by amino acid residue and atom position.

Select alpha and beta carbons

Define:

-   -   Ca1_(i)=alpha carbon belonging to the first of two polypeptides    -   Ca2_(i)=alpha carbon belonging to the second of two polypeptides    -   Cb1_(i)=beta carbon belonging to the first of two polypeptides    -   Cβ2_(i)=beta carbon belonging to the second of two polypeptides    -   Coordinates of Ca1_(i): x_(A1i), y_(A1i), z_(A1i)    -   Coordinates of Ca2_(i): x_(A2i), y_(A2i), z_(A2i)    -   Coordinates of Cβ1 _(i): x_(B1i), y_(B1i), z_(B1i)    -   Coordinates of Cβ2_(i): x_(B2i), y_(B2i), z_(B2i)        Assembly of Residue Pairs

Assembly of all possible inter-chain pairs of residues.

Define

-   -   j (subscript)=pair of amino acids as they fall within the above        alignment system of both polypeptide chains        Compilation of Relevant Measurements; Secondary, Derivative Data

2-D derivative Data

Computation of Residue Characteristics for each Physical Property

Retrieval of numeric values of each side-chain physical property foreach amino acid occurring at each residue

-   -   Match every amino acid identity at each residue in the look-up        table, and retrieve corresponding numeric values

Calculation of weighted statistical measurements for each residue.

Define:

-   -   wμ_(pi)=weighted average of the sample, s, of numeric values of        a physical property at each residue, i, weighted by each        occurring amino acid s frequency of occurrence, f_(i)    -   wσ_(pi)=weighted standard deviation of the sample, s, of numeric        values of a physical property at any residue, i, weighted by        each occurring amino acid s frequency of occurrence, f_(i)

Calculate:

-   -   for the sample of sequences in the database, s, for each        residue, h, and for each physical property, p        wμ _(pi)=Σ(N _(pi) *f _(pi))/Σf _(pi)        wσ _(pi) =SQRT((Σ_(pi)*Σ(f _(pi) *N _(pi) ²)−Σ(f _(pi) *N        _(pi))²)/Σf_(pi)*Σ(fpi−1))

Calculation of un-weighted statistical measurements for each residue.

Define:

-   -   uμ_(pi)=un-weighted average of the sample, s, of numeric values        of a physical property at any residue, i, not weighted by each        occurring amino acid's frequency of occurrence, f_(i)    -   uσ_(pi)=un-weighted standard deviation of the sample, s, of the        numeric values of a physical property at any residue, i, not        weighted by each occurring amino acid's frequency of occurrence,        f_(i)

Calculate:

-   -   for the sample of sequences in the database, s, for each        residue, i, and for each physical property, p:        uμ _(pi)=(Σn _(pi))/n _(i)        uσ _(pi) =SQRT((n _(i) *Σn _(pi) ²−Σ(n _(i) *N _(pi))²)/n        _(i)*(n _(i)−1))        Calculation of Each Pair's Combined Average and Standard        Deviation

For both residues of each pair the sum of both average and standarddeviation values are calculated for each physical property.

Calculate:

For every residue pair, j:wμ _(pj) =wμ _(pi) +wμ _(pi)uμ _(pj) =uμ _(pi) +uμ _(pi)wσ _(pj) =wσ _(pi) +wσ _(pi)uσ _(pj) =uσ _(pi) +uσ _(pi)3-D Derivative Data

Calculation of Residue Pari Inter-Atomicalphacarbon Distances, D_(α)

Application of Pythagorean geometry to the alpha carbon coordinates ofeach residue pair, j.

Calculate:

-   -   For every residue pair, j:        D _(+j) =Sqrt((x _(A1i) −x _(A2i))²+(y _(a1i) −y _(A2i))²+(Z        _(A1i) −Z _(A2i))²)    -   And for the sample of structures in the database, m    -   =μ_(αj)=Average of all D_(αj)    -   V_(αj)=Median of all D_(αj)    -   σ_(αj)=Standard deviation of all D_(αj)    -   Max_(αj)=Maximum of all D_(αj)    -   Min_(αj)=Minimum of all D_(αj)        Calculation of Difference Between Residue Pair Alpha- and Beta        Carbon Distances, Δ_(j)

Application of Pythagorean geometry to residue pair beta carboncoordinates, and subtraction.

Calculate:

-   -   For every residue pair, j:    -   D_(βj): formula as described for alpha-carbon distance        measurement with beta carbon distance measurement with beta        carbon coordinates x_(B1 and 2, y) _(B1 and 2, z) _(B1 and 2)        Δ_(j) D _(αj) −D _(βj)

And for the sample of structures in the database, m

-   -   μ_(Δj)=Average of all Δ_(j)    -   v_(Δj)=Median of m Δ_(j)    -   σ_(Δj)=Standard deviation of all Δ_(j)    -   Max_(Δj)=Maximum of all Δ_(j)    -   Min_(Δj)=Minimum of all Δ_(j)        Calculation of 3D Angles, φ_(j) and ψ_(j)

Define:

-   -   φ_(j)=angle described by the atoms (points)        Cβ1_(i)-Cα1_(i)-Cα2_(i)    -   ψ_(j)=angle described by the points Cβ2_(i)-Cα2_(i)-Cα1_(i)    -   va1_(j)=vector from Cα1_(i) to Cα2_(i),    -   va2_(j)=vector from Cα2_(i) to Cα1_(i),    -   vb1_(j)=vector from Cα1_(i) to Cβ1_(i),    -   vb2_(j)=vector from Cα2_(i) to Cβ2₁,

Calculate:

vector coordinates, for every residue pair, j: va1_(j) va2_(j) va1_(j)va2_(j) X_(va1j) = x_(A2i) − x_(A1l) x_(va2j) = x_(a1i) − x_(A2i)x_(vb1j) = x_(B1i) − x_(A1i) x_(vb2j) = x_(B1i) − x_(A2i) y_(va1j) = y_(A2i) − y _(A1l) y _(va2j) = y _(A1i) − y _(A2i) y _(vb1j) = y _(b1i) −y _(A1i) y _(vb2j) = y _(B1i) − y _(A2i) z_(va1j) = z _(A2i) − Z _(A1l)z _(va2j) = z _(A1i) − z _(A2i) z _(vb1j) = z _(B1i) − z _(A1i) z_(vb2i) = z _(B1i) − z _(A2i)

Calculate:

-   -   Angle φ_(j) (based on scalar products), for every residue pair,        j        $\varphi_{j} = {\arccos\left( \frac{\left( {{x_{va1j}*x_{vb1j}} + y_{va1j} + {*y_{vb1j}z_{vb1j}*z_{vb1j}}} \right)}{{{sqrt}\left( {x_{va1j}^{2} + y_{va1j}^{2} + z_{va1j}^{2}} \right)}*{{sqrt}\left( {x_{vb1j}^{2} + y_{vb1j}^{2} + z_{vb1j}^{2}} \right)}} \right)}$    -   And for the sample of structures in the database, m    -   μ_(σj)=Average of all φ_(j)    -   v_(σj)=Median of all φ_(j)    -   σ_(φj)=Standard deviation of all φ_(j)    -   Max_(φj)=Maximum of all φ_(j)    -   Min_(φj)=Minimum of all φ_(j)

Calculate:

-   -   Angle ψ_(j) (based on scalar products), for every residue pair,        j        $\Psi_{j} = {\arccos\left( \frac{\left( {{x_{va2j}*x_{vb2j}} + y_{va2j} + {*y_{vb2j}z_{vb2j}*z_{vb2j}}} \right)}{{{sqrt}\left( {x_{va2j}^{2} + y_{va2j}^{2} + z_{va2j}^{2}} \right)}*{{sqrt}\left( {x_{vb2j}^{2} + y_{vb2j}^{2} + z_{vb2j}^{2}} \right)}} \right)}$    -   And for the sample of structures in the database, m    -   μ_(ψj)=Average of all ψ_(j)    -   v_(ψj)=Median of all ψ_(j)    -   σ_(ψj)=Standard deviation of all ψ_(j)    -   Max_(ψj)=Maximum of all ψ_(j)    -   Min_(ψj)=Minimum of all ψ_(j)        Calculation of the Third 3D-Angle

Define:

-   -   Vector g1_(j) (vg1_(j)): A1_(i)-B2_(i)    -   Plane E1_(j), described by vectors va1_(j) and vb1_(j)    -   Plane E2_(j), described by vectors va1_(j) and vb1_(j)    -   Vector n1_(j)(vn1_(j)), perpendicular to E1_(j), the vector        product of va1_(j) and vb1_(j)    -   Vector n2_(j)(vn2_(j)), perpendicular to E2_(j), the vector        product of va1_(j) and vb1_(j)

Calculate:

vg1 coordinates, for every residue pair, j Vg1_(j) x_(vg1j) = x _(B2I) −x _(A1i) y _(vg1j) = y _(B2i) − y _(A1i) z _(vg1j) = z _(B2i) − z _(A1i)

Calculate:

-   -   vn1 and vn2 coordinates (vector products), for every residue        pair, j        -   vn1_(j)=vector product of va1_(j) and va2_(j)

vn2_(j)=vector product of va1_(j) and vg1_(j) vn1_(j) vn2_(j) x_(vn2j) =y_(va1j)*z_(vb1j) − y_(vb1j)*z_(vb1j) x_(vn2j) = y_(va1j)*z_(vb1j) −y_(vb1j)*z_(vb1j) y_(vn2j) = z_(va1j)*x_(vb1j) − z_(vb1j)*x_(vb1j)y_(vn2j) = z_(va1j)*x_(vb1j) − z_(vb1j)*x_(vb1j) z_(vn2j) =x_(va1j)*y_(vb1j) − x_(vb1j)*y_(vb1j) z_(vn2j) = x_(va1j)*y_(vb1j) −x_(vb1j)*y_(vb1j)

Calculate:

-   -   Angle between vn1_(j) and vn2_(j), angle χ_(j), for every        residue pair, j        $\chi_{j} = {\arccos\left( \frac{\left( {{x_{vn1j}*x_{vn2j}} + {y_{vn1j}*y_{vn2j}z_{vn1j}*z_{vn2j}}} \right)}{{{sqrt}\left( {x_{vn1j}^{2} + y_{vn1j}^{2} + z_{vn1j}^{2}} \right)}*{{sqrt}\left( {x_{vn2j}^{2} + y_{vn2j}^{2} + z_{vn2j}^{2}} \right)}} \right)}$    -   And for the sample of structures in the database, m    -   μ_(χi)=Average of all χ_(j)    -   v_(χj)=Average of all χ_(j)    -   σ_(χj)=Standard deviation of all χ_(j)    -   Max_(χj)=Maximum of all χ_(j)    -   Min_(χj)=Minimum of all χ_(j)        Compilation of Residue Pair Ratings; Tertiary, Derivative Data

Residue pair Ratings Based on 2-D Database

For each physical property chosen for the selection process.

Define:

-   -   T_(p)=sum of the numeric values of the physical properties of        the amino acids to be substituted with in both polypeptide        chains (2× value of tyrosine for the tyrosine oxidative        cross-link)    -   v_(p)=allowable multiples of the weighted and un-weighted        standard deviations of a physical property's values, uσ_(pj).

Rating (R) based on numeric values of a physical property, p,corresponding to occurring amino acids, weighted by the frequency ofeach amino acid's occurrence.

Calculate:

-   -   For each residue pair, j        wR _(pj) =v _(p) *wσ _(pj)/(abs(T _(p) −wμ _(pj) −v _(p) *wσ        _(pj))

Rating based numeric values of a physical property, p, corresponding tooccurring amino acids.

Calculate:

-   -   For each residue pair, j        uR _(pj) =v _(p) *uσ _(pj)/(abs(T _(p) −uμ _(pj) −v _(p) *uσ        _(pj))        Residue Pair Ratings Based on 3-D Database        Alpha Carbon Spacing

Define:

-   -   V_(Rα) allowable multiples of the standard deviation of        inter-chain alpha carbon distances, σ_(αj)    -   vMax_(α): maximal value allowable for μ_(αj) in the selection        process    -   vMin_(α): minimal value allowable for μ_(αj) in the selection        process    -   T_(α): Target value for alpha carbon spacing    -   R_(αj): Rating based on inter-chain alpha carbon spacing, scores        high for residue pairs, j, with μ_(αj) values close to the        target value, T_(α), and/or with high σ_(αj) values        (flexibility)

Calculate:

-   -   T_(α)=average of vMax_(α) and vMin_(α)

For all residue pairs, j For all μ_(αj) < T_(α): For all μ_(αj) < T_(α):R_(αj) = (T_(α) − μ_(αj) + _(vrα)*σ_(αj))²/σ_(αj) R_(αj) =(μ_(αj + Vrα)*σ_(αj) − T_(α))²/σ_(αj)Φ and ψ Angles

Define:

-   -   V_(Rφψ): allowable multiples of the standard deviation of φ_(j)        and ψ_(j) angles, σ_(φj) and σ_(φj)    -   vMax_(φ·ψ): maximal value allowable for μ_(αj) in the selection        process (same value for both angles)    -   vMax_(φ·ψ): minimal value allowable for μ_(αj) in the selection        process (same value for both angles)    -   T_(φψ): Target value of φ and ψ angles (same value for both        angles)    -   R_(φ·ψj): Rating based on the angles φ and ψ; scores high for        residue pairs, j, with μ_(φj) and μ_(ψj) values close to the        target value, T_(φ·ψ), and/or with high σ_(φj) and σ_(ψj) values        (flexibility)    -   r_(φ): sub-rating based on the angle φ    -   r_(ψ): sub-rating based on the angle ψ

Calculate:

-   -   T_(φ·ψ)=average of vMax_(φ·ψ) and vMin_(φ·ψ)

For every residue pair, j For all μ_(φj) < T_(φ.Ψ): For all μ_(αj) <T_(φ.Ψ): r_(φj) = (T_(φ.Ψ) − μ_(φj) + _(Vrφ.Ψ)*σ_(φj))²/σ_(φj) r_(φj) =(μ_(φ.Ψ) + V_(Rφ.Ψ)*σ_(φj) − T_(φ.Ψ))²/σ_(φj) r_(φj) = (T_(φ.Ψ) −μ_(Ψj) + _(VRφ.Ψ)*σ_(Ψj))²/σ_(φj) r_(Ψj) = (μ_(Ψ.Ψ) + V_(Rφ.Ψ)*σ_(Ψj) −T_(φ.Ψ))²/σ_(Ψj) R_(φ.Ψj) = average of r_(φj) and r_(Ψj)

Difference Between Alpha- and Beta Carbon Spacing

Define:

-   -   V_(RΔ): allowable multiples of the standard deviation for each        residue pair, j, of m differences between inter-chain alpha- and        beta carbon distances, σ_(Δj)    -   vMax_(Δ): maximal value allowable for μ_(Δj) in the selection        process    -   vMin_(Δ): minimal value allowable for μ_(Δj) in the selection        process    -   T_(Δ): Target value for the difference between alpha beta carbon        spacing    -   R_(Δj): Rating based on differences between inter-chain alpha-        and beta carbon distances, scores high for residue pairs, j,        with μ_(Δj) values close to the target value, T_(Δj), and/or        with high σ_(Δj) values (flexibility)

Calculate:

-   -   T_(Δ)=average of vMax_(Δ) and vMin_(Δ)

For all residue pairs, j For all μΔ_(j) < TΔ For all μΔ_(j) > TΔ R_(Δj)= (T_(Δ) − μ_(Δj) + V_(RΔ)*σ_(Δj))²/σ_(Δj) R_(Δj) = (μ_(Δj) + V_(RΔ) +*σ_(Δj) − T_(Δ))²/σ_(Δj)

Selection Processes The Sequence of Filters is of No Significance

I 2D Selection Processes

Filter I.1: Selection for Conserved Residues

For all residue pairs

-   If the amino acids of residue pair J are both assigned mark ‘con’    (conserved), select-   If either amino acid of a residue pairj is assigned ‘non’    (variable), discard

Filter I.2: Selection Against Residues that have Glycine as Consensus

Selection of Pairs of which neither residue is most frequently glycine,for all residue pairs:

-   If the consensus (most frequently occurring amino acid) of neither    residue of a pair j is glycine, select-   If the consensus (most frequently occurring amino acid) of either    residue of a pair j is glycine, discard

Filter I.3: Selection Based on Weighted Statistical Measurements

Selection using statistical measurements of a physical property, p, ofoccurring amino acids at each residue, i, of every residue pair, j,weighted by the occurring amino acid's frequency of occurrence.

Define:

-   -   Max_(wRp): maximum limit for the selection of an amino acid        side-chain physical property, p, based on weighted statistical        measurements    -   Min_(wRp): minimum limit for the selection of an amino acid        side-chain physical property, p, based on weighted statistical        measurements

Calculate:

-   -   IF [Min_(wRp)<wR_(pj)<Max_(wRp)] is True, select    -   IF [Min_(wRp)<wR_(pj)<Max_(wRp)] is False, discard

Filter I.4: Selection Based on Un-Weighted Statistical Measurements

Selection using statistical measurements of a physical property, p, ofoccurring amino acids at each residue, i, of every pair, j, not weightedby the occurring amino acid's frequency of occurrence.

Define:

-   -   Max_(uRp): maximum limit for the selection of an amino acid        side-chain physical property, p, based on weighted statistical        measurements    -   Min_(uRp): minimum limit for the selection of an amino acid        side-chain physical property, p, based on weighted statistical        measurements

Calculate:

-   -   IF [Min_(uRp)<uR_(pj)<Max_(uRp)] is True, select    -   IF [Min_(uRp)<uR_(pj)<Max_(uRp)] is False, discard        II 3D Selection Process

Filter II.1: Selection for Average Alpha-Carbon Distances withinSelection Range

Calculation:

-   -   For all residue pairs:    -   IF [vMin_(α)<μ_(αj)<vMax_(α)] is True, select    -   IF [vMin_(α)<μ_(αj)<vMax_(α)] is False, discard

Filter II.2: Selection for Sufficient Flexibility of Alpha CarbonSpacing

Calculation:

-   -   For all residue pairs:    -   For all μ_(αj)<T_(α)    -   IF [μ_(αj)+v_(Rα)*σ_(αjm)>T_(α)]=True, select    -   IF [μ_(αj)+v_(Rα)*σ_(αj)>T_(α)]=False, discard    -   For all μ_(α)>T_(α)    -   IF [μ_(αj)−v_(Rα)*σ_(αj)<T_(α)]=True, select    -   IF [μ_(αj)+v_(Rα)*σ_(αj)<T_(α)]=False, discard

Filter II.3: Selection for Pairs with φ and ψ Angles within theSelection Range

Calculation:

-   -   IF [vMin_(φ·ψ)<μ_(φj)<vMax_(φ·ψ)] AND        [vMin_(φ·ψ)<μ_(ψj)<vMax_(φ·ψ)] is True, select    -   IF [vMin_(φ·ψ)<μ_(φj)<vMax_(φ·ψ)] AND        [vMin_(φ·ψ)<μ_(ψj)<vMax_(φ·ψ)] is False, discard

Filter II.4: Selection for Average Differences Between Alpha- and BetaCarbon Distances within Selection Range

-   -   μ_(Δj)=average difference between residue alpha carbon and beta        carbon distances

Calculation:

-   -   For all residue pairs    -   IF [vMin_(Δ)<μ_(Δj)<vMax_(Δ)] is True, select    -   IF [vMin_(Δ)<μ_(Δj)<vMax_(Δ)] is False, discard

Filter II.5: Selection for Sufficient Flexibility of the Pairs'Difference Between Alpha and Beta Carbon Distances

Calculation:

-   -   For all residue pairs:    -   For all μ_(Δj)<T_(Δ)    -   IF [μ_(Δj)+v_(RΔ)*σ_(Δj)>T_(Δ)]=True, select    -   IF [μ_(Δj)+v_(RΔ)*σ_(Δj)>T_(Δ)]=False, discard    -   For all μ_(α)>T_(Δ)    -   IF [μ_(Δj)−v_(RΔ)*σ_(Δj)>T_(Δ)]=True, select    -   IF [μ_(Δj)−v_(RΔ)*σ_(Δj)>T_(Δ)]=False, discard

Final Selection

Selected Amino Acid Pairs

All residue pairs, j, that are selected in all Filters (I.1-4 and II1-6)are compiled and listed.

Sort and Select by Ratings

All listed residue pairs are compared by their Ratings, and the pairwith the highest Ratings is the FINAL SELECTION.

6.12. Point Mutagenesis and Sub-Cloning into Expression Vectors 6.12.1.Conservative Substitutions for Undesired Tyrosine Residues

cDNA fragments encoding the Fv fragment heavy and light chains of themonoclonal anti-α5-integrin antibody (example 1), or the monoclonalanti-β1-integrin antibody (example 2) are isolated from the hybridomasthat produce them according to standard procedures known in the art. Forexample, RNA is isolated from the pellet of a suspension culture ofhybridoma cells, the RNA is reversed transcribed using a mixture ofpoly-A and random primers, and cDNAs of the heavy and light chains areisolated by the RACE method. The sequences of the heavy and lightchains, that are to be cross-linked according to the procedures of theinstant invention, are identified by standard procedures, and alignedwith the K&W numbering system. Tyrosine residues identified are examinedfor their predicted proximity and positional flexibility toward eachother. Residue pairs at which reactive side chains are found in thesequence that are either within an average of 15 Å or less in thesample, or that have an average and standard deviation, such that theaverage less one standard deviation is 15 Å or less in the sample areidentified. Of these pairs, the residue of the pair at which tyrosineoccurs at the lowest frequency in the 2-D Database, is point mutated tophenylalanine. Point mutations are introduced by using the QuikChange™Site-Directed Mutagenesis Kit (Stratagene, Catalog # 200518).

6.12.2. Substitution of Residues of a Selected Pair with Tyrosine

At the residues of the pair selected, as described above, amino acidsubstitutions are introduced by point mutation, so far as tyrosine isnot already present at the selected residues of the pair in thesequences of the heavy and light chains of the Fv fragment to bestabilized. Point mutations are introduced by using the QuikChange™Site-Directed Mutagenesis Kit (see above).

6.12.3. Expression Vector and System

DNA fragments encoding the Fv fragment heavy and light chains, allcontaining the conservative amino acid substitutions for undesiredtyrosine residues, identified as described above, with and without theamino acid substitutions of residues of the selected pair with tyrosineare isolated. The isolated fragments (inserts) are subcloned into a pGEXexpression vector containing the TEV-protease cleavage site. For thepurposes of measuring the Fv fragments retained affinity for itsantigen, the insert encoding the heavy chain is also fused with anucleotide sequence encoding a Hemaglutinin (HA)-tag at the 3′ end(C-terminus of the protein), for which a secondary antibody iscommercially available. For the purposes of using the Fv fragment indiagnostic, therapeutic, or any other commercial applications, however,the HA-tag should be removed again. Subcloning is carried out bystandard procedures known in the art.

6.13. Fv Fragment Bacterial Expression and Purification

The above-described expression plasmids encoding modified heavy andlight Fv fragments are transformed competent BL21 or XA90 bacteria.Frozen glycerol stocks (0.5 ml) are prepared from individual ampicillinresistant clones, with which expression cultures (e.g. 1000 ml LuriaBroth: 10 gm tryptone, 5 gm yeast extract, 5 gm NaCl) containing 100μg/ml ampicillin) are inoculated. The cells are grown at 30° C. on arotary shaker (300 rpm), and protein expression is induced with 1 mMIPTG at an OD600 of 0.6. Following a three hour incubation, bacteria areharvested by centrifugation at 4000 g at 4° C. The pellet is resuspendedwith ice-cold 50 ml Lysis Buffer (20 mM Tris.Cl pH 7.9, 500 mM NaCl, 10%glycerol, 20 mM β-mercaptoethanol, 1 mM PMSF, 20 μg/ml leupeptin, 20μg/ml pepstatin, 1% aprotinin) and then sonicated on ice until lysisis >90% complete. Insoluble matter is removed by centrifugation at20,000 g at 4° C. for 20 min. The supernatant is then incubated with 2ml Glutathione sepharose (Pharmacia) for 2 hrs at 4° C. The beads arethen pelletted by centrifugation at 4000 g, and washed (re-suspended andpelletted) twice in 10 ml Lysis Buffer and twice in 10 ml TEV-proteaseCleavage Buffer (Novagen). The beads are then incubated with 1 μgHis-tagged TEV protease (Novagen) at 30° C. for 1 hr in 2 ml CleavageBuffer. The protease is subsequently removed by adding 0.1 mlequilibrated NTA-agarose (Qiagen) slurry to the suspension. Partiallypurified FvH and FvL fragments are present in the supernatant followingcentrifugation at 4000 g.

6.14. Introduction of the Oxidative Tyrosyl-Tyrosyl Cross-Link

The Fv fragment heavy and light chain gene products containing only themutations of undesired reactive tyrosine residues to phenylalanine,without the mutations of the selected residue pair to tyrosine arepartially purified and equilibrated by dialysis in phosphate bufferedsaline (PBS) before mixing them at equal molarity (0.1-1000 μM). Thecatalyst, metalloporphyrin 20-tetrakis(4-sulfonateophenyl)-21H,23H-porphine manganese (III) chloride (MnTPPS)is then added on ice to a concentration of 1 μM, 5 μM, 10 μM, 50 μM and100 μM to the reaction. The reaction is then initiated by the additionof the oxidant potassium mono-persulfate to a concentration of 1-100 μM,at room temperature or otherwise, for each of the concentrations of thecatalyst, and at several protein concentrations. After 45 seconds thereaction is quenched by the addition of Tris.Cl pH7.9 to 50 mM and,β-mercaptoethanol to 10 mM, and the solution is again dialyzed againstPBS to remove the catalyst, oxidizing and reducing agents. Cross-linkedand not cross-linked hetero-dimers and monomers are isolated bygelfiltration FPLC. The efficiency of the cross-link reaction is testedby non-reducing PAGE and Coomassie blue staining.

At each protein concentration, the maximal concentration of oxidizingreagent and catalyst at which a cross-link between the polypeptides ofthe reaction does not form is noted. These conditions are used tocatalyze the reaction between the Fv fragment heavy and light chain geneproducts containing both the mutations of undesired reactive tyrosineresidues to phenylalanine, and the mutations of the selected residuepair to tyrosine. Cross-linked and not cross-linked hetero-dimers andmonomers are isolated by gelfiltration FPLC. The efficiency of thecross-link reaction is tested by non-reducing PAGE and Coomassie bluestaining.

6.15. Testing the Stabilized Complex 6.15.1. Yield of FunctionallyStabilized Fv Fragment Complex

Yield of functionally cross-linked Fv fragments is tested by passing acarefully determined amount of cross-linked, and glycerolgradient-purified Fv fragment protein over an immobilized antigencolumn, and comparing the flow-through with the starting material andthe eluate of the column. Protein concentration measurements are carriedout by standard procedures, such as Bradford or Lowrie assays (Bradford,1976, and Lowrie, 1954), Coomassie- or silverstaining, or Westernblotting.

6.15.2. Retained Affinity

Fv fragments that are successfully cross-linked under the variousconditions described above are tested for their retained affinity inELISA-type procedures. Using 96 well-plates, the inside surfaces of theELISA-assay plate wells are coated with antigen, for example integrin α5(Example 1) and integrin β1 (Example 2). The wells are washed, and withrespect to one another, half the concentration of the full lengthantibody and an equal molar concentration of the F(ab) fragment of theantibody (see below) as positive controls, and the Fv fragment of theantibody, cross-linked as described above, are incubated in PBS for twohours at 37° C. in serial dilutions in the wells coated with therespective antigen on one plate. F(ab) fragments are derived by pepsindigestion of the full length antibody and subsequent purification firstby removal of the Fc fragments by running the antibody/protease solutionthrough a Protein A column, and second by fractionating the flow-throughof the Protein A column by ion exchange FPLC to remove the protease. Thewells are washed four times with 200 μl of PBS and the anti-HA tag andalkaline phosphatase-coupled secondary antibody are sequentiallyincubated in PBS for an additional hour at 37° C. Wells are washed againfour times with 200 μl of PBS. The concentrations of bound IgG, F(ab)fragment, and Fv fragment are determined by standard procedures with anELISA assay reader.

6.15.3. Stability in Serum, Lysate, and the Cytoplasm

Stability of the complex in serum is tested in time-course experimentsby incubating the complex in human serum at 37° C., 38° C., 39° C., 40°C., 42° C., and 45° C. for up to two weeks, and testing for theremaining levels of functional Fv fragment complexes. As controls, thestability of Fab, scFv's and/or dsFv's are compared, all tagged with thesame marker.

Stability of the complex in the cytoplasm is tested, also in time-courseexperiments, analogously to the incubation in serum, by incubating thecomplex in cell-lysates. More directly, the stability of the complex inthe cytoplasm is tested by scrape-loading tissue culture cells withstabilized Fv fragments and assaying for the remaining levels offunctional complexes. As controls, the stability of scFv's and dsFv's ofthe same original immunoglobulin molecule, both tagged with the samemarker as the cross-linked Fv fragment, are compared.

In all of these experiments, the remaining levels of functionalcomplexes will be determined in ELISA assays with the same secondaryantibody, as described above.

6.15.4. Immunogenicity

Mice are injected with various doses, ranging from 1 μg to 10 mg, ofstabilized complex. Stabilized complex is injected in the presence andabsence of Freunds (Complete) Adjuvant. Further injections are given tothe mice as boosts every five days (in the presence and absence ofIncomplete Adjuvant). The mice receive a total of three or fourboost-immunizations.

Tail-vein blood samples are taken before each injection, and one weekafter the final boost. Blood samples are spun at 3000 g for 30 min at 4°C.

ELISA plates are coated with the stabilized complex and a mixture of theunstabilized Fv fragment heavy and light chains, and ELISA assays areperformed according to standard procedures, using a labeled anti-mousesecondary antibody.

The immunogenicity of complexes stabilized by the methods of the instantinvention are compared to dsFv's and scFv's constructs of the sameoriginal immunoglobulin molecule as controls.

6.15.5. Biodistribution

¹⁸F radiolabeled stabilized Fv fragments, labeled according to theprocedures published by Lang L. and Eckelmann U., 1994, are injectedinto mice. Each mouse is injected with 3 μg of roughly 4.5 MBq/μg of Fvfragment complex. Injected animals are sacrificed at 15, 45, 90, 360min. and 24 h. and immediately exsanguinated by cardiac puncture.Tissues are separated, dried and weighed on an analytical balance, andcounted in a gamma-radiation counter using a high energy setting (for¹⁸F). Aliquots of blood are also dried and counted. Counts are correctedfor decay. Tissue:blood ratios, and the percentage of injected dose pergram tissue are calculated for each tissue.

Early-phase blood clearance studies are performed in mice injected withthe same amount of above described ₁₈ F radio-labeled stabilized Fvfragments. Serial tail-vein blood samples are taken at 1, 2, 5, 10, 15,and 30 min. The samples are dried and counted as described above, andthe half-life of the Fv fragments in blood is calculated according tostandard procedures (Choi C. W. et al. Cancer Research; vol. 55: pp.5323-5329, 1995).

As controls for the above studies, single chain and disulfide Fvfragment constructs of the same original immunoglobulin molecule arecompared.

7. EXAMPLE II Candida Antarctica Lipase B (CALB)

The following example illustrates certain variations of the methods ofthe invention for protein and protein complex stabilization. Thisexample is presented by way of illustration and not by way of limitationto the scope of the invention.

Introduction

Several polypeptides with significant commercial value have beenidentified in recent years, and furthermore, for many of thesepolypeptides structural data is available. In the following section,methods of stabilizing one polypeptide, a biocatalyst, for which data isavailable only for the polypeptide itself, but not for other,structurally related polypeptides. Specifically, described below are theresidue pair selection process, introduction of point mutations,expression of the polypeptides and their purification anddeglycosylation, the cross-link reaction itself, and analysis of theresulting stabilized biocatalyst; for the description of the adjustmentof the cross-link reaction conditions, refer to Chapter 6. Furthermore,a description of the combination of the dityrosine stabilizationtechnology with a complementary technology, a directed evolutionapproach, is described.

The biocatalyst stabilized in the below example is the lipase B ofCandida antarctica (“CALB”, FIGS. 1C, 15A), an enzyme for which multiplecommercially relevant applications are possible due to its exquisiteenantioselectivity, of which some are still uneconomic due to its lackof stability under adverse reaction conditions.

The structure file 1 LBS containing the three dimensional atomiccoordinates of the polypeptide's crystal structure is obtained from theBrookhaven National Laboratory Protein Database. The derivative datarelevant to the selection process is calculated as described. Theselection process is carried out using a set of filters that isconvenient and appropriate for this application of the instantinvention.

Point mutations to tyrosine (directing the cross-link reaction) areintroduced according to the final selection of the selection process, asdescribed. The polypeptide is expressed in Pichia pastoris as a yeastalpha factor fusion protein, which directs the secretion of the fusionprotein. The protein is affinity purified by its C-terminal His(6) tag,using NTA column.

The minimally required reaction conditions are adjusted as described inChapter 6. The cross-link efficiency of the reaction is tested, and theresulting, stabilized biocatalyst is then tested for retained activityand specificity, and for improved stability in time, and under adverseconditions.

Advantages of the Tyrosyl-Tyrosyl Cross-Link for Biocatalysts

The underlying chemistry of the technology covered by the presentinvention causes an oxidative cross-link to form between reactiveside-chains of polypeptides that form stable complexes. The dityrosinebond is stable under a broad range of pH and redox conditions. Thecross-link reaction requires close proximity between the reactiveside-chains that will cross-link.

Thus, the current invention describes a new technology that allowsstabilization of biocatalysts and enables their use in a broader rangeof industrial applications. This technology is designed to improve onpreceding, and complement compatible, technologies.

The resultant stabilized biocatalysts will have the followingcharacteristics:

-   -   1. The enzymes will be more stable under a broad range of        reaction conditions, including, but not limited to, temperature,        pH, pressure, salinity, or concentration of other compounds in        the reaction, such as a reducing agent, which is often a        component of the chemical reaction for which the catalyst is        required.    -   2. The resultant cross-linked and stabilized biocatalyst will        retain its activity and specificity due to the specificity of        the cross-link reaction and to the selection process.

This stabilization technology is well suited for the development of newproducts with novel applications, the improvement of existing industrialbiocatalysts, and the complementation of existing technologies for thedevelopment of novel biocatalysts.

Biocatalyst Applications

Biocatalytic enzymes constitute the preferred class of catalysts forindustrial processes due to their high specificity and turnover rates,and their low development costs and cycle times. However, their utilityis limited by the relative instability and limited shelf-life of proteinmolecules that is exacerbated under adverse reaction and/or storageconditions. The technology of this invention that can be applied tostabilize biocatalysts, thereby enhancing their utility and broadeningtheir commercial application.

Application of the instant invention stabilizes enzymes withspecifically placed internal cross-links, and thereby increases thestability of enzymes without impairing their activity in the desiredreaction conditions. The resulting increase in enzyme stability thus notonly addresses shelf-life limitations but also increases the enzymes'reaction rates and process yields.

Industrial biocatalytic processes are used in many industry sectors,including the chemical, detergent, pharmaceutical, agricultural, food,cosmetics, textile, materials-processing, and paper industries. Withinthese industries, biocatalysts have many applications, ranging fromproduct synthesis (e.g. amino acid manufacturing, and fine chemicalsynthesis of small-molecule pharmaceuticals) through use as activeagents in products (for example, in biological washing powders) to usein diagnostic testing equipment. Biocatalysts also have industrialapplications that range from wastewater and agricultural soil treatment,to crude oil refinement (e.g. desulfurication).

Thus, the example of an application of the instant invention describedbelow focuses on a problem of wide relevance, and promises to contributesignificantly to the US scientific and technical knowledge base.

Selection of Optimal Residues for Tyrosyl-Tyrosyl Cross-Link

The selection process consisted of a series of tests or ‘filters’ aimedat successively narrowing down the residue pairs most likely to resultin a cross-linked tyrosine pair that minimally alter the activity orspecificity of the enzyme, while lending maximal stability.

Data Used for the Analysis

Coordinate data for distance calculations of all atoms other thanhydrogens of CALB was downloaded from the protein structure databaseBrookhaven National Laboratory (www.bnl.pdb.gov; FIG. 5). These dataprovide the three-dimensional coordinates (x, y, and z) for each atom inthe solved structure, expressed in metric units, i.e. Angströms (10 ⁻¹⁰m, Å). These data also contains the amino acid sequence of thepolypeptide. With this data it was possible to calculate thethree-dimensional distances between any desired atoms (e.g. alpha andbeta carbon atoms).

Selection Methodology

Optimal residues, to which the cross-link reaction is directed, wereselected by a series of filters based on the measurements of values in adatabase compiled for the purposes of this selection. This databasecontains numeric measurements of (1) alpha carbon spacing, (2) betacarbon spacing and the difference between the alpha and beta distances,and (3) residue amino acid usage (see below).

Filter 1: Selection of Sufficiently-Spaced Aromatic Residues

Because there are a significant number of aromatic residues available inthe sequence of CALB, and because mutation of an aromatic residue (otherthan tyrosine, i.e. tryptophane, phenylalanine, or histidine) totyrosine would be maximally conservative, for the selection process ofthis example, only aromatic residue pairs were analyzed. Furthermore, tomaximize the degree to which application of the instant inventionstabilizes the enzyme, only pairs that are spaced more than 40 aminoacids apart in the two-dimensional amino acid sequence are selected.TABLE 11 Aromatic residue pairs with alpha carbon distances within therange of 5.70 Å to 9.74 Å, space more than 20 residues apart. Alphacarbon Cα-Cβ Distance CALB residue pair distance Difference Phe9 Tyr829.29 −0.20 Phe48 Trp104 8.85 1.53 Trp52 Tyr234 8.71 0.02 Phe131 Tyr1836.19 −1.31 Trp104 His224 9.33 0.33 Tyr135 Tyr203 7.58 0.10 Tyr183 His2248.20 −1.09 Phe117 Tyr300 7.7 2.07

Filter 2: Identification of Appropriately-Spaced Residue Pairs

To find residue pairs spaced appropriately for a tyrosyl-tyrosyl bond,the alpha carbon to alpha carbon distance between every residue pair inthe polypeptide was calculated in a 3D database. This calculation wasperformed by applying Pythagorean geometry to the 3D coordinates of thealpha carbons (FIG. 6). Based on the calculations above, as a secondcut, all residue pairs were selected whose alpha carbons are spacedwithin the selection range.

Because of the lack of statistical measurements that give insight topositional flexibility, the selection range was reduced by 2 Å, but onlyon the upper limit.

The range that was selected for was the following:

-   -   Min 5.70 Å, Max 9.74 Å.

Filter 3: Side-Chain Orientation

In the space that the heavy and light chains occupy, the tyrosine sidechains should be oriented toward each other for a cross-link to formwith minimal structural distortion. The difference between the alphacarbon distance (i.e. the backbone carbon distance; FIG. 6) and the betacarbon distance (i.e. the distance between the first carbons in eachside chain; FIG. 8) of each residue pair was calculated as a proxy, i.e.an estimate of the orientation of the side chains relative to each other(FIG. 9).

The range that was selected for was the following:

-   -   Min −2 Å, Max 3.0 Å.

Again, based on 3D coordinate geometry, for each residue pair, thedistance between the beta carbons was calculated (FIG. 8). The betadistance was then subtracted from the alpha distance of the residue pair(FIG. 9). This filter was based on whether the difference in the alphaand beta distances of a residue pair falls within the estimated optimalrange. In this example, all of the residue pairs in Table 11 met thiscriterion.

Filter: Partial Elimination of Pairs with Residues in Proximity to theActive Site of the Enzyme

The functionality of an enzyme as a biocatalyst lies in its ability tocatalyze chemical reaction. The activity and selectivity of a catalystis most sensitive at those sites where the catalyst and the reactantsphysically contact each other. Therefore, mutations and/or cross-linksare least desirable in the active site, and residues in or proximal tothe active site are excluded.

His224 is in the active site, and is therefore excluded. Because Tyr183is in close proximity to His224, the selected residues below should bemutated to generate polypeptides with tyrosine pairs, with and withoutthe mutation of Tyr183 to Phe183. Furthermore, because His224 is also inclose proximity to Trp 104, and because Trp104 is in close proximity toPhe48, residue pairs containing the above residues are also excluded.The remaining residue pairs are list in Table 12 below. TABLE 12 List ofremaining residue pairs with relevant distance measurements. Alphacarbon Cα-Cβ Distance Epsilon carbon CALB residue pair distanceDifference distance* Phe117 Tyr300 7.7 2.07 4.59 Trp52 Tyr234 8.71 0.027.00 Tyr135 Tyr203 7.58 0.10 9.08 Phe9 Tyr82 9.29 −0.20 9.31*In Trp52, Epsilon N1 is used.

Analysis of Epsilon Carbon Distances

Because the most likely isomer of the di-tyrosine bond is thought to bethe epsilon-epsilon bond, and because coordinate data for an epsilonposition atom of all of the amino acids selected is available, thedistances between the epsilon positions of the above selected residuepairs in Table 12 were analyzed.

The pairs in Table 12 are ranked according to their epsilon carbondistances. However, since in three of the four pairs a point-mutation isrequired to generate a tyrosine pair, these distances may be altered,and all of the pairs are generated and examined.

Generating Proteins Containing the Selected Point Mutations VectorConstriction of pPal-CALB The C. antarctica lipase B gene (plasmidpMT1335) is isolated by polymerase chain reaction (PCR) omitting thepre-propeptide sequence according to standard procedures known in theart, using the plasmid pMT1335 (Patkar et al. Chem.& Phys. Of Lipids,1998. Vol. 93, pp. 95-101) as a template. The lipase gene is amplifiedusing the primers A and B (see FIG. 15B) for the introduction of anEcoRI (and a His(6)-tag) and a NotI site at the 5′- and 3′-end,respectively. The PCR product and the vector pPICZalphaA (Invitrogen)are digested with the restriction enzymes EcoRI and NotI, and gelpurified, using the kit QiaexII Gel extraction Kit (Qiagen, 2001 catalog# 20021) according to the manufacturer's protocol. The insert is ligatedinto the vector, resulting in a fusion between the yeast alpha-factorsecretion signal peptide (sequence contained in pPICZalphaA) and CALB,and the resulting plasmid construct, pPal-CALB, is transformed bystandard methods known in the art into competent HB101 cells (E. coli).The transformants are selected on LB-Amp agar plates. The CALB gene issequenced by standard methods known in the art.

Point Mutagenesis

At the residues of the pair selected, as described above, amino acidsubstitutions are introduced by point mutation, so far as tyrosine isnot already present at the selected residues, using forward primer forM1 together with Primer B, and forward and reverse primers for M2 andM3, as described in FIG. 15B. Point mutations are introduced by usingthe QuikChange™ Site-Directed Mutagenesis Kit (see above).

Protein Expression and Purification

Protein expression and purification are carried out according to anadapted method published by Rotticci-Mulder et al. The yeast strain P.pastoris SMD1168 (his4, pep4) (Invitrogen) is used for the expression ofCALB (Schmidt-Dannert. Bioorg. & Med. Chem., 1999. Vol. 7, pp.2123-2130; Rotticci-Mulder et al. Prot. Expr. & Purif. 2001. Vol. 21,pp. 386-392.). Cells are made competent and transformed by standardmethods known in the art, and transformants are selected on RD His agarplates (186 g sorbitol, 20 g agar, 20 g dextrose, 13.4 g yeast nitrogenbase, 0.2 mg biotin, 50 mg amino acid mix without histidine per liter).P. pastoris is grown in YPD medium (10 g yeast extract, 20 g peptone, 20g dextrose per liter) or BMGY medium (10 g yeast extract, 20 g peptone,13.4 g yeast nitrogen base, 0.4 mg biotin, 10 mL glycerol, and 100 mL 1M K₂HPO₄/KH₂PO₄, pH 6.0 per liter). Protein expression under the controlof the AOX1 methanol-inducible promoter is induced by growing theculture in BMMY medium (10 g yeast extract, 20 g peptone, 13.4 g yeastnitrogen base, 0.4 mg biotin, 5 mL methanol, and 100 mL of a 1 MK₂HPO₄/KH₂PO₄ solution, pH 6.0 per liter).

Five-hundred milliliters of BMGY in a 5000-mL E-flask are inoculatedwith 1 mL of an overnight yeast culture in YPD and grown overnight at28° C., 300 rpm. The medium is changed for 500 mL BMMY to induce forlipase expression. Methanol is added to the culture medium to a finalconcentration of 0.5%(v/v) every 24 h for the following 3 days. Thesample is collected by separating the culture medium from the cells bycentrifugation.

Aliquots of the sample are taken and concentrated according to standardprocedures known in the art. The concentrated sample is separated bySDS-PAGE on a 12%, polyacrylamide gel, and analyzed by Coomassie Blueand silver staining.

The protein is bound to NTA column (Qiagen) that binds the protein'sHis-tag according to the manufacturer's protocol, and the beads arewashed several times with Phosphate Buffered Saline (PBS). Again theprotein is analyzed by separation on a 12% polyacrylamide gel, andanalysis by Coomassie Blue and silver staining.

Deglycosylation

Endoglycosidase H and endoglycosidase F (Boehringer-Mannheim, Mannheim,Germany) are used to cleave N-linked carbohydrates from CALB produced inP. pastoris. Digestion is performed according to the manufacturer'sinstructions under reducing conditions on the NTA beads. Thedeglycosylated protein is separated by SDS-PAGE oil a 12% polyacrylamidegel, and analyzed by staining, and by Western blot analysis using anantibody to the c-myc tag (see above).

Active-Site Titration of Recombinant Lipase

Active-site titration of the purified lipase was performed using amethyl p-nitrophenyl n-hexylphospho-nateinihibitor in order to determinethe concentration of active enzyme (Rotticci-Mulder et al. Prot. Expr. &Purif. 2001. Vol.21, pp. 386-392). The active-site concentration wasdetermined by measuring the concentration of released p-nitrophenolatespectrophotometrically at 25° C. and 400 nm.

Lipase Activity Assay

The hydrolytic activity of the lipase is tested by measuring hydrolysisof tributyrin. The substrate solution (0.2 M tributyrin, 2% gumarabicum, 0.2 M CaCl₂) is emulsified by sonication for 1 min. Thereaction is initiated by the addition of enzyme to the substrateemulsion. The enzymatic reaction is carried out at 25° C. and pH 7.5,and the level of the enzyme's activity is measured by titration of thereleased fatty acid with 100 mM sodium hydroxide, using a pH-stat(Rotticci-Mulder et al. Prot. Expr. & Purif. 2001. Vol. 21, pp. 386-392;TIM900 Titration Manager Radiometer, Denmark).

Stabilization of CALB

Introduction of the Dityrosine Bond

Introduction of the dityrosine bond is carried out both on and off theNTA beads. To cross-link the enzyme on the beads, the catalyst,metalloporphyrin 20-tetrakis (4-sulfonateophenyl)-21H,23H-porphinemanganese (III) chloride (MnTPPS) is then added to PBS to aconcentration of 1 μM, 5 μM, 10 μM, 50 μM and 100 μM to the reaction.The reaction is initiated by the addition of the oxidant potassiummono-persulfate to a concentration of 1-100 μM, at room temperature orotherwise, for each of the concentrations of the catalyst. The beads areagitated, and after 45 seconds, 60 seconds, and 2 minutes the reactionis quenched by the addition of Tris HCl pH7.9 to 50 mM andβ-mercaptoethanol to 10 mM, and the beads are washed several times inPBS to remove the catalyst, oxidizing and reducing agents.

To cross-link the enzyme in solution, the protein is eluted from the NTAcolumn according to the manufacturer's protocol, the eluate isequilibrated by dialysis in phosphate buffered saline (PBS), and theprotein concentration is adjusted to several concentrations between 100nM and 1 mM. The catalyst, metalloporphyrin 20-tetrakis(4-sulfonateophenyl)-21H,23H-porphine manganese (III) chloride (MnTPPS)is added on ice to a concentration of 1 μM, 5 μM, 10 μM, 50 μM and 100μM to the reaction. The reaction is then initiated by the addition ofthe oxidant potassium mono-persulfate to a concentration of 1-100 μM, atroom temperature or otherwise, for each of the concentrations of thecatalyst, and at several protein concentrations. After 45 seconds thereaction is quenched by the addition of Tris.Cl pH7.9 to 50 nM and,β-mercaptoethanol to 10 mM, and the solution is again dialyzed againstPBS to remove the catalyst, oxidizing and reducing agents.

The efficiency of the cross-link reaction is tested by reducing andnon-reducing PAGE and Coomassie blue staining.

Improved Stability and Retained Activity

The retained hydrolytic activity of the lipase is tested by incubatingequal amounts of the wild type and cross-linked mutants of the enzyme inPBS at 55° C., 60° C., 65° C., and 95° C. for 0, 1, 2, 5, 10, 15, 30,60, and 90 min. Furthermore, the activity of the enzyme is assayedadding 0, 10 mM, 50 mM, 150 mM, 0.5M, 1M, and 2M of NaCl and othersalts, 0 1 mM, 10 mM, 50 mM, 150 mM, 0.5M, and 1M beta mercaptoethanol.The remaining activities of the wild type and various mutants are thenassayed hydrolyzing tributyrin, as described above. The enzymaticactivity of the wild type and mutant enzymes in various pH conditions isdetermined spectrophotometrically by measuring the hydrolysis ofp-nitrophenyl esters (e.g. p-nitrophenyl palmitate and/or p-nitrophenyllaurate), and the release of p-nitrophenol, at 410 nm.

Dityrosine Stabilization and Directed Evolution General Approach

The strategy for combining a directed evolution approach with thedityrosine technology described herein is based on the concept that thecross-link conditions can be viewed as a selection environment/selectivepressure to which the gene is adapted during the in vitro evolution ofthe enzyme. In the following, an approach is described that is anadaptation of the approach described by Liebeton et al. (Liebeton et al.“Directed Evolution of an Enantioselective Lipase”. Chem. & Biol. 2000.Vol. 7 (9), pp. 709-718). Random mutations are introduced to identifysites that enhance the cross-link efficiency, the enzyme's performanceupon cross-linking, or the stability of the protein in the presence ofthe cross-link. These sites are then further examined by saturationmutagenesis to identify the optimal mutation at the identified site.

Thus, first the mutations to tyrosine are introduced at the selectedresidues, as described above. Second site mutations are then randomlyintroduced by error-prone PCR using the mutated gene as the template,and the resulting genes, containing on average approximately 1-2 mutantsper copy, are ligated into the expression vector, pYES2.1 V5-His-TOPO(Invitrogen), and transformed into S. cerevisiae.

Secretion of the enzyme is directed by a S. cerevisiae signal-peptide.The secreted protein is cross-linked in the supernatants of thecultures, and cross-linked and non-cross-linked protein is heat-treatedat 60° C. The resulting enzymes are analyzed by adding a reaction buffercontaining substrate specific for lipases, in which the activity of theenzyme can easily be detected by spectrophotometric analysis. Clonesidentified as more readily cross-linked, more active upon cross-linking,and/or more thermostable, are recovered from the original S. cerevisiaeclone and sequenced.

Second site mutations identified are further analyzed by saturationmutagenesis. Once the optimal mutation for a site is identified, aconstruct containing this mutation is used as the template for anotherround of random second site mutation screening, and saturation mutagenicanalysis. This process is iterated 10 to 15 times over.

Vector Construction of pYal-CALB

The DNA encoding the yeast alpha factor-CALB fusion proteins isamplified from the pPal-CALB vectors containing the point mutations, asdescribed above, using the primers Primer C and D described in FIG. 15B.The PCR products are ligated into the pYES2.1/V5-His-TOPO vector(Invitrogen) according to the manufacturer's protocol, and transformedinto competent HB101 cells (E. coli) according to standard proceduresknown in the art. The transformants are selected on LB-Amp agar plates.Plasmid DNA is isolated, and the CALB genes (wild type and mutants) aresequenced by standard methods known in the art.

These constructs are isolated and purified using the Qiagen Plasmid MaxiKit (Qiagen, 2001 catalog number 12162) according to the manufacturer'sprotocol.

Error Prone PCR Reactions

10 ug of the pYal-CALB vectors are cut with the restriction enzymesEcoRI and NotI, and the resulting linearized plasmid are gel purifiedusing the Qiaex II Gel Extraction Kin (see above) according to themanufacturer's protocol.

A total volume of 50 μl of 67 mM Tria HCl pH 8.8, 16.6 mM (NH₄)₂SO₄, 6.1nM MgCl₂, 6.7 mM EDTA, 0.2 mM dNTPs, 10 mM beta-mercaptoethanol, 10%(v/v) DMSO, 0.15 μM each of the Primers E and D, as described in FIG.15B, contains 1 ng of template DNA and 2 units of GoldstarTaq-polymerase (Eurogentec). Ten parallel samples overlaid with 70 μlparaffin are amplified using the following thermo-cycling protocol:

-   -   1 cycle: 2 min. 95° C.    -   25 cycles: 1 min. 94° C., 2 min. 64° C., 1 min. 64° C.    -   1 cycle: 7 min. 72° C.

PCR products are gel purified with the Qiaex II Gel Extraction Kit, cutwith the restriction enzymes EcoRI and NotI, and again gel purified withthe Qiaex II Gel Extraction Kit (see above).

In a total volume of 10 μl, 5 pmols each of insert and vector areligated for two hrs. at room temperature according to standardprocedures known in the art. Ligated DNA is transformed into competentHB101 cells according to standard procedures known in the art, and thecells are grown overnight as a culture, selecting for amp. resistance.Plasmid DNA is recovered using the Qiagen Plasmid Midi Kit (Qiagen, 2001catalog number 12143) according to the manufacturer's protocol.

Transformation and Expression in S. cerevisiae

The constructs are transformed into competent, uracil auxotrophic S.cerevisiae using the S.C. EasyComp Transformation Kit (Invitrogen, 2001catalog number k5050-01) according to the manufacturer's protocol.Transformants are isolated on selection plates. Because expression ofthe inserts in the pYal-CALB vectors is driven by a Gal-induciblepromoter, the yeast strains are grown in an SC-U medium with 2% glucosesuppressing protein expression (supSC-U) containing 0.67% yeast nitrogenbase (without amino acids with ammonium sulfate, 2% glucose, 0.01% eachof adenine, arginine, cysteine, leucine, lysine, threonine, tryptophan,and uracil, 0.005% each of aspartic acid, histidine, isoleucine,methionine, phenylalanine, proline, serine, tyrosine, and valine.Protein expression is induced by changing the medium to an SC-U mediumwith 2% galactose (indSC-U) containing 0.67% yeast nitrogen base(without amino acids with ammonium sulfate, 2% galactose, 0.01% each ofadenine, arginine, cysteine, leucine, lysine, threonine, tryptophan, anduracil, 0.005% each of aspartic acid, histidine, isoleucine, methionine,phenylalanine, proline, serine, tyrosine, and valine. Upon induction,the enzymes with and without the point mutations are secreted into themedium, and can easily be affinity purified by their His(6) tags overNTA columns. The optimal period of induction is determined by inducingfor 1, 2, 8, and 36 hours and measuring the activities in the culturessupernatants.

Approximately 1000-2000 transformants are each picked with steriletoothpicks and resuspended in a well of a 96-deep-well microtiter platefilled with 1 ml of supSC-U. Cultures are incubated on a shakerovernight at 30° C. To induce protein expression, the cultures are spundown (15 min. at 5000 g), the supernatants are removed, and 1 ml ofindSC-U is added to each well. The cultures are spun down, thesupernatants are distributed into 96 well plates for analysis of theenzymes (see below), and the cells are resuspended and maintained insupSC-U to be able to recover the plasmid DNA.

Cross-Linking in Supernatants of the Cultures

Cross-linked and uncross-linked enzymes are compared afterheat-inactivation; because of the large number of colonies to bescreened for increased activity/stability, the protein in the 96-wellplates is cross-linked directly in the supernatants of the cultures.

35 μl of each supernatant is transferred to two 96-well plates to which5 μl each of 10×PBS, 1 mM MnTPPS (catalyst, see above), and to thesamples on one of the 96 well plates, 5 μl of 1 mM KH₂SO₄ (oxidant) areadded. After 2 minutes, the cross-link reaction is quenched in thesamples of the plates to which the oxidant was added by the addition of2.5 μl of 2.88M β-mercaptoethanol. To the samples on the other plate,7.5 μl of 1× PBS are added.

Lipase Stabilization/Activity Assay

Lipase activity is measured both before and after heat inactivation. Theperiod for which the protein is best heat-treated at 60° C. isdetermined on the wild-type in a time-course experiment. A cross-linkedand a non-cross-linked 96-well plate are each heat-inactivated at 60° C.for the determined period of time. Lipase activities are determined byhydrolysis of p-nitrophenyl palmitate and spectrophotometric analysis at410 nm, according to the methods published by Liebeton et al. andWinkler & Stuckmann (Liebeton et al. “Directed Evolution of anEnantioselective Lipase”. Chem. & Biol. 2000. Vol. 7 (9), pp. 709-718;Winkler & Stuckmann. “Glycogen, Hyaluronate, and Some OtherPolysaccharides Greatly Enhance the Formation of Exolipase by Serratiamarcescens ”. J. Bacteriol. 1979. Vol. 138, pp. 663-670).

Saturation Mutagenesis

Saturation mutagenesis is performed as described for site directed pointmutagenesis, with mutagenic primers in which the codon underinvestigation is randomized by mixing equal amounts of nucleosidephosphoamidates during synthesis. The optimal codon for that position isagain identified by screening approximately 150-200 clones for activityupon cross-linking with and without heat treatment, as described above.

8. EXAMPLE III Subtilisin E

The following example illustrates certain variations of the methods ofthe invention for protein and protein complex stabilization. Thisexample is presented by way of illustration and not by way of limitationto the scope of the invention.

Introduction

In the following section, methods of stabilizing one polypeptide, abiocatalyst, for which structural data is available for severalstructurally or functionally related polypeptides. Specifically,described below are the residue pair selection process, the introductionof point mutations, bacterial expression of the polypeptides and theirpurification, the cross-link reaction itself, and analysis of theresulting stabilized biocatalyst. For the description of the cross-linkreaction and the adjustment of the cross-link reaction conditions, referto Chapter 6.

The biocatalyst stabilized in the below example is the serineendopeptidase Subtilisin E (FIG. 16A), which is one of the mostcommercially important biocatalysts. Subtilisin E is a secreted proteinof Bacillus subtilis, and it cleaves ester and amide bonds. It is usedfor the total hydrolysis of proteins and peptides at alkaline pH. It hasbeen successfully applied toward the racemic resolution of amino acids,amines, carboxylic acids and alcohols and in peptide synthesis, e.g.D-terminal deprotection.

The structure files containing the three dimensional atomic coordinatesof the polypeptides are obtained from the Brookhaven National LaboratoryProtein Database. The derivative data relevant to the selection processis calculated as described. In addition to the statistical selectionprocess, carried out using a set of convenient and appropriate filters,data regarding improved stability of the protein upon introduction ofdisulfide bonds is used to select potential residue pairs to which thecross-link is directed.

Point mutations to tyrosine (directing the cross-link reaction) areintroduced according to the final selection of residue pairs (Tables 15and 16, FIG. 16D), and expressed in Bacillus subtilis. The polypeptideis affinity purified and cross-linked, and the resulting biocatalyst isevaluated, as described.

Selection of Optimal Residues for Tyrosyl-Tyrosyl Cross-Link

The selection process consisted of (1) a review of functional data onsubtilisin enzymes with improved half-lives upon introduction ofdisulfide bonds, and (2) the statistical measurements on the alphacarbon distances within the polypeptides of a series of tests or‘filters’ aimed at successively narrowing down the residue pairs mostlikely to result in a cross-linked tyrosine pair that minimally altersthe activity or specificity of the enzyme, while lending maximalstability. Furthermore, residue pairs are further evaluated bycomputationally modeling the mutations to tyrosine.

Data Used for the Analysis

Coordinate data for distance calculations of 3 related subtilisinproteins (subtilisin E and BPN, and subtilisin from Bacillus lentus)from crystallographically solved structures was downloaded from theprotein structure database at Brookhaven National Laboratory(http:/www.pdb.bnl.gov or http://www.rcsb.org; files 1SCJ, 1DUI, 1C13).These data provide the three-dimensional coordinates (x, y, and z) foreach atom in the solved structure, expressed in metric units, i.e.Angströms (10⁻¹⁰⁰ m, Å). These data also contain the sequence and/oramino acid usage of the polypeptide. With this data, aligned as shown inFIGS. 16B and C, it was possible to calculate the three-dimensionaldistances between any desired atoms Functional data regarding improvedstability of the enzyme was taken from the literature (see below).

Selection Methodology

Optimal residues, to which the cross-link reaction is directed, wereselected first based on the amino acid usage within the set ofstructurally and functionally related polypeptides, selecting forresidues that in all of the polypeptides of the set are either Trp, Tyr,Phe, Lys, Pro, or His residues. From this set of residues, residue pairswere selected based on their average alpha carbon distances within theset of structurally and functionally related polypeptides. Finallyresidue pairs were selected from the above set of residue pairs based onthe proximity of the modeled tyrosine side-chains. This was done bymodeling the mutations using the automated, knowledge-based proteinmodeling server Swiss Model, and visualizing the resultant polypeptides'structures, and with the program Swiss pdb Viewer, both of which areavailable from the proteomics server of the Swiss Institute ofBioinformatics (SIB; www.expasy.ch). Additionally, residue pairs wereselected that had previously been mutated to cysteines and formeddisulfide bonds, stabilizing the enzyme and maintaining its activity.

Filter 1: Selection of Residues Based on Amino Acid Usage

To minimize the distortions that point mutations to tyrosine willintroduce into the structure of the enzyme, residues were selected thatin every enzyme the sample have aromatic, or hydrophobic amino acids.Amino acids that were scored for included Trp, Try, Phe, His, Pro, Lys,Leu, and Arg, w hereby Leu and Arg were only permitted in maximally ⅓ ofthe sample. Selected residues are listed in Table 13. TABLE 13 Selectedresidues based on their amino acid usage. Residue AA Consensus* ResidueConsensus 6 Tyr (W) 130 Pro 14 Pro 168 Tyr 17 His 169 Pro 21 Tyr (K) 172Tyr 27 Lys 190 Phe 39 His 202 Pro 40 Pro 211 Pro 50 Phe 215 Tyr 52 Pro218 (Leu, Tyr, Lys) 57 Pro 226 Pro 65 His 227 His 68 His 238 Lys 87 Pro240 pro 92 Tyr 242 Trp 95 Lys 263 Tyr (L) 114 Trp 264 Tyr*non-consensus amino acids occurring at a position are indicated inparentheses.

Filter 2: Selection of Residue Pairs Based on average Alpha CarbonDistances

To find residue pairs spaced appropriately for a tyrosyl-tyrosyl bond,the alpha carbon to alpha carbon distance between every residue pair andeach of the polypeptides in the set used for the statistical analysiswas calculated in a 3D database. This calculation was performed byapplying Pythagorean geometry to the 3D coordinates of the alpha carbons(FIG. 6). Analogously to the selection described in Chapter 7, the rangethat was selected for was the following:

-   -   Min 5.70 Å, Max 9.74 Å.

Furthermore, because the dityrosine bond is intended to stabilize asingle polypeptide rather than cross-link two or more proteins of acomplex, it was important to select for residues that were sufficientlyspaced in the two-dimensional polypeptide chain to maximize thestabilizing effect of the engineered dityrosine bond. Residue pairs wereselected that are more than 40 residues apart. TABLE 14 Aromatic residuepairs with alpha carbon distances within the selection range, eachspaced more than 40 residues apart. Alpha carbon Subtilisin E residueAlpha carbon distance st. pairs average distance dev. Tyr6 Pro202 8.20.32 His17 Pro87 8.9 0.08 Tyr21 Pro87 9.5 0.16 Tyr21 Lys238 6.3 0.51Lys27 Tyr92 7.4 0.09 His39 Pro211 6.8 0.22 Phe50 Lys95 6 0.04 Phe50Trp114 9.6 0.07 His65 Pro211 9.1 0.04 His65 Tyr218 9.0 0.03 His68 Pro2118.2 0.06 His68 Tyr215 8.1 0.03 His68 Tyr218 8.3 0.002 His68 Pro226 9.50.06 Pro130 Lys171 9.5 0.11

Based on these calculations, as a second cut, all residue pairs wereselected from the set of residues identified based on the residues'amino acid usage that have average alpha carbon distances within theselection range, and that are sufficiently spaced, as listed in Table13.

Residue Pair Selection Based on Structural Modeling and Visualization ofthe Mutations

By modeling the mutations indicated in Table 14, the likelihood wasassessed that each residue pair would form a ditryosine bond, stabilizethe enzyme, and introduce minimal distortions into the structure of theprotein, particularly in the active site of the enzyme, to maximize itsretained activity and specificity. This was achieved by using theautomated knowledge-based protein modeling server Swiss Model, andvisualizing the resultant polypeptides' structures and with the programSwiss pdbViewer, as stated above. Taking the epsilon carbon distances,calculated in the Swiss pdbViewer, between the modeled tyrosyl sidechains into consideration, and the residues' proximity to the activesite, residues that looked the most promising were selected. Theremaining residue pairs are listed in Table 15. TABLE 15 List ofremaining residue pairs with relevant distance measurements. Cα-CβEpsilon Alpha carbon Distance carbon CALB residue pair distanceDifference distance* Tyr6 Pro202 8.2 0.32 4.30 His17 Pro87 8.9 0.08 5.31Tyr21 Lys238 6.3 0.51 4.02 Lys27 Tyr92 7.4 0.09 5.69*Epsilon carbon distances of the modeled tyrosine pairs.

Selection of Additional Residue Pairs Based on Functional Data

Functional data is available regarding positional suitability ofresidues at which engineered disulfide bonds improve upon the stabilityof subtilisin enzymes. This information was taken into account, andresidues were added to the selection of Table 15 that were able toconfer significant stability by forming a disulfide bond betweenengineered cystine side-chains while maintaining the enzymes' activity.

Articles containing such data include Takagi et al., 1990 (Enhancementof the Thermostability of Subtilisin E by Introduction of a DisulfideBond Engineered on the Basis of Structural Comparison with aThermophilic Serine Protease. JBC 1990. Vol. 265(12); pages 6874-8),Mansfeld et al., 1997 (Extreme Stabilization of a Thermolysin-likeProtease by an Engineered Disulfide Bond. JBC 1997. Vol. 272(17); pages11152-56), Takagi et al., 2000 (Engineering Subtilisin E for EnhancedStability and Activity in Polar Organic Solvents. J. Biochem. 2000. Vol.127; pages 617-25), and Mitchinson and Wells (Protein engineering ofdisulfide bonds in subtilisin BPN′. Biochemistry 1989. Vol. 28(11);pages 4807-15). In Table 16 below, these additionally-selected residuesare listed along with their most relevant functional data. TABLE 16Additionally selected residue pairs based on disulfide bond data fromthe literature. Mutations/Disulfide Secondary Enzyme positionsStructures* Half-life Activity Subt. E & BPN G6lC/S98C & N61C/A98C H3 -BS3 2-3 × w/t w/t Subt. E K170C/E195C BS6 - BS7 60% w/t 46% w/t BPND36C/P210C BS2 - BS8 w/t No report*Secondary structures cross-linked by the disulfide bond. H: alphahelix; BS: beta sheet.

Introduction of the Point Mutations at the Selected Residues

According to the final selection of residue pairs (Tables 15 and 16,FIG. 16 D), PCR is used to introduce point mutations to tyrosine, andnucleotides are added to the 3′ end of the wild type and mutant genes(FIG. 16D, Primers A and B) to introduce a poly-histidine tag to thepolypeptide. Point mutations are introduced by PCR using theQuickChange™ Site-Directed Mutagenesis Kit (Stratagene, 1998 Catalog #200518). The 5′ primer (FIG. 16D, Primer A) creates an NdeI site, andthe 3′ primer (FIG. 16D, Primer B) creates a BamHI site.

The PCR product is digested with NdeI and BamHI, purified, and ligatedinto the multiple cloning site of a shuttle expression-vector thatpropagates both in bacillus and in E. coli, and that directs expressionof the polypeptide under the Bacillus subtilis subtilisin promoter(PBE3, Zhao and Arnold, 1999). Ligated constructs are transformed intocompetent HB101 cells, grown, isolated, and analyzed by standardrestriction enzyme digestion and sequencing.

Expression and Purification of the Protein

To express the proteins, the plasmids described above are transformedinto competent cells of a strain of subtilisin negative bacillussubtilis (DB428; Zhao and Arnold, 1999). Cells are grown for 36 hours at37° C., and protein is purified from the supernatants of the cultures.

The protein is bound to NTA column supplied by Invitrogen that binds theproteins' His-tags, by methods known to one skilled in the art, and/oraccording to the manufacturer's protocol, and the beads are washedseveral times with Phosphate Buffered Saline (PBS). The cross-linkreaction and the adjustment of the reaction conditions, as otherwisedescribed in Chapter 6, are carried out on the beads in PBS containingthe catalyst of the cross-link reaction, 20tetrakis(sulfonatophenyl)-21H,23H-porphorine manganese (III) chloride(MnTTP), and the oxidant, KHSO₅, supplied by Fluka as 47% of a mixturecontaining KHSO₄ and K₂SO₄.

Analysis of the Resultant Cross-Linked Enzyme

The assay for the activities of the various mutants of the enzyme arecarried out using 0.2 nM suc-AAPF-pNa as the substrate in a buffercontaining 100 mM Tris 8.0 and 10 mM CaCl₂. The activity is monitoredspectrophotometrically by measuring absorbance of the reaction mixtureat a wave length of 410 nm.

The enzymes are analyzed, first to determine the mutants' activitybefore cross-linking, relative to the wild-type enzyme. Enzymes purifiedfrom 100 μl of the cultures supernatants are analyzed for their activityby letting the enzyme assay reaction run for 0, 30, 60, and 90 min.Furthermore, the enzymes are analyzed for activity before and aftercross-linking, as described above. Finally, the stability of the enzymesis determined by time-course heat inactivation experiments, where theenzymes are incubated for 0, 1, 2, 5, 15, and 60 minutes at 45° C., 55°C., 65° C., and 95° C.

9. REFERENCES

-   Campbell L. A. et al. Protein Cross-linking Mediated by    Metalloporphyrins. Bioorganic and Medicinal Chemistry, vol. 6: pp.    1301-1037, 1998-   Brown K. C. et al. Highly Specific Oxidative Cross-link of Proteins    Mediated by a Nickel-peptide Complex. Biochem.; vol. 34(14): pp.    4733-4739, 1995-   Pollitt S. and Schultz P. Agnew. Chem. Int. Ed.; vol. 37(15): pp.    2104-2107, 1998-   Spangler B. D. and Erman J. E. Cytochrome c Peroxidase Compound I:    Formation of Covalent Protein Crosslinks During the Endogenous    Reduction of the Active Site. Biochim. Biophys. Acta; vol. 872(1-2):    pp. 155-7, 1986-   Gmeiner B. and Seelos C. Phosphorylation of Tyrosine Prevents    Dityrosine Formation in vitro. FEBS Lett; vol. 255(2): pp. 395-7,    1989-   Kanwar R. and Balasubramanian D. Structure and Stability of the    Dityrosine-linked Dimer of GammaB-crystallin. Exp. Eye Res.; vol.    68(6): pp. 773-84, 1999-   Fancy D. A. and Kodadek T. Chemistry for the Analysis of    Protein-protein Interactions: Rapid and Efficient Cross-linking    Triggered by Long Wavelength Light. Proc. Natl. Acad. Sci., U.S.A.;    vol. 96: pp. 6020-24, 1999-   Klinman J. P. (ed.). Redox-active Amino Acids in Biology. Methods in    Enzymology; vol. 258, 1995-   Richards, F. M. The Interpretation of Protein Structures: Total    Volume, Group Volume Distributions and Packing Density. J. Mol.    Biol.; vol. 82: pp. 1-14, 1974-   Eisenberg, D. Three-dimensional Structure of Membrane and Surface    Proteins. Ann. Rev. Biochem.; vol. 53: pp. 595-623, 1984-   National Brookhaven Laboratory Protein Database (on-line at    www.nbl.pdb.gov)-   Pastan et al. Recombinant Disulfide Stabilized Polypeptide Fragments    Having Binding-specificity. U.S. Pat. No. 5,747,654, issued May 5,    1998-   Hofmann K. The Modular Nature of Apoptotic Signaling Proteins. Cell    Mol. Life Sci.; vol. 55(8-9): pp. 1113-28, 1999-   Johnson, G. et al. Weir's Handbook of Experimental Immunology I.    Immunochemistry and Molecular Immunology, Fifth Edition, Ed. L. A.    Herzenberg, W. M. Weir, and C. Blackwell, Blackwell Science Inc.,    Cambridge, Mass., Chapter 6.1-6.21, 1996-   Wickelgren 1. Mining the genome for drugs. Science; vol. 285(5430):    pp. 998-1001, 1999--   Leong S. R. et al. IL-8 single-chain homodimers and heterodimers:    interactions with chemokine receptors CXCR1, CXCR2, and DARC.    Protein Sci.; vol. 6(3): pp: 609-17, 1997-   Pawson T. Tyrosine Kinase Signalling Pathways. Princess Takamatsu    Symp.; vol. 24: pp. 303-22, 1994-   Cowburn D. Peptide Recognition by PTB and PDZ Domains. Curr. Opin.    Struct. Biol.; vol. 7(6): pp. 835-8, 1997-   Bockaert J. and Pin J. P. Molecular Tinkering of G Protein-coupled    Receptors: an Evolutionary Success. EMBO J.; vol. 18(7): pp. 1723-9,    1999-   Royet J. et al. Notchless Encodes a Novel WD40-repeat-containing    Protein that Modulates Notch Signaling Activity. EMBO J.; vol.    17(24): pp. 7351-60, 1998-   Chou J. J. et al. Solution Structure of the RAIDD CARD and Model for    CARD/CARD Interaction in Caspase-2 and Caspase-9 Recruitment. Cell;    vol. 94(2): pp. 171-80, 1998-   Black R. A. and White J. M. ADAMs: Focus on the Protease Domain.    Curr Opin Cell Biol.; vol. 10(5): pp. 654-9, 1998-   Strasser A. and Newton K. FADD/MORT1, a Signal Transducer that Can    Promote Cell Death or Cell Growth. Int. J. Biochem. Cell. Biol.;    vol. 31(5): pp. 533-7, 1999-   McInnes C. and Sykes B. D. Growth Factor Receptors: Structure,    Mechanism, and Drug Discovery. Biopolymers; vol. 43(5): pp. 339-66,    1997-   Lotz M. et al. The Nerve Growth Factor/Tumor Necrosis Factor    Receptor Family. J. Leukoc. Biol.; vol. 60(1): pp. 1-7, 1996-   Casaccia-Bonnefil P. et al. p75 Neurotrophin Receptor as a Modulator    of Survival and Death Decisions. Microsc Res Tech.; vol. 45(4-5):    pp. 217-24, 1999-   Natoli G. et al. Apoptotic, Non-apoptotic, and Anti-apoptotic    Pathways of Tumor Necrosis Factor Signalling. Biochem. Pharmacol.;    vol. 56(8): pp. 915-20, 1998-   Alber T. Structure of the Leucine Zipper. Curr. Opin. Genet. Dev.;    vol. 2(2): pp. 205-10, 1992-   Griffith T. S. et al. Functional Analysis of TRAIL Receptors Using    Monoclonal Antibodies. J. Immunol.; vol. 162(5): pp. 2597-605, 1999-   Yasuda H. et al. Identity of Osteoclastogenesis Inhibitory Factor    (OCIF) and Osteoprotegerin (OPG): a Mechanism by which OPG/OCIF    Inhibits Osteoclastogenesis in vitro. Endocrinology; vol. 139(3):    pp. 1329-37, 1998    -   Ortiz A. et al. New Kids in the Block: the Role of FasL and Fas        in Kidney Damage. J. Nephrol.; vol. 12(3): pp. 150-8, 1999-   Price Waterhouse: Survey of Biopharmaceutical Industry, 1998 Boston    Consulting Group: The Contribution of Pharmaceutical Companies:    What's at stake for America, 1993-   Pharmaceutical Research and Manufacturers of America. New Medicines    in Develoment, Survey. % xv %    www.phrma.org/publications/industry/profile99/chap2.html, 1998-   Penuche M. L. et al. Antibody-IL-2 Fusion Proteins: a Novel Strategy    for Immune Protection. Hum Antibodies; vol. 8(3): pp. 106-18, 1997-   Sensel M. G. et al. Engineering Novel Antibody Molecules. Chem.    Immunol.; vol. 65: pp. 129-58, 1997-   Reiter Y. and Pastan I. Recombinant Fv Immunotoxins and Fv Fragments    as Novel Agents for Cancer Therapy and Diagnosis. TIBTECH; vol.    16(12): pp. 513-520, 1998-   Reiter Y. et al. Engineering Antibody Fv Fragments for Cancer    Detection and Therapy: Disulfide-stabilized Fv Fragments. Nat    Biotech.; vol. 14: pp. 1239-1245, 1996-   Pluckthun A. and P. Pack. New Protein Engineering Approaches to    Multi-valent and Bi-specific Antibody Fragments. Immunotechnology;    vol. 3(2): pp. 83-105, 1997-   Wright A. and Morrison S. L. Effect of Glycosylation on Antibody    Function: Implications for Genetic Engineering. Trends Biotechnol.;    vol. 15(1): pp. 26-32, 1997-   Schwartz M. A. et al. Monoclonal Antibody Therapy. Cancer Chemother.    Biol. Response Modif.; vol. 13: pp. 156-74, 1992-   Houghton A. N. and Scheinberg D. A. Monoclonal Antibodies: Potential    Applications to the Treatment of Cancer. Semin Oncol.; vol. 13(2):    pp. 165-79, 1986-   Cao Y. and Suresh M. R. Bi-specific Antibodies as Novel    Bio-conjugates. Bioconjugate Chemistry; vol. 9(6): pp. 635-644, 1998-   Raag R. and Whitlow M. Single-chain Fvs. FASEB; vol. 9: pp. 73-80,    1995-   Webber K. O. et al. Preparation and Characterization of a    Disulfide-stabilized Fv Fragment of the Anti-Tac Antibody:    Comparison with its Single-chain Analog. Mol. Immunol.; vol. 32(4):    pp. 249-258, 1995-   Klinman J. P. (ed.). Redox-active Amino Acids in Biology. Methods in    Enzymology, vol. 258, 1995-   Bosilevac J. M. et al. Inhibition of Activating Transcription Factor    1- and cAMP-responsive Element-binding Protein-activated    Transcription by an Intracellular Single-chain Fv fragment. J. Biol.    Chem.; vol. 273(27): pp. 16874-16879, 1998-   Graus-Porta D. et al. Single Chain Mediated Intracellular Retention    of ErbB-2 Impairs Neu Differentiation Factor and Epidermal Growth    Factor Signaling. Mol. Cell Biol.; vol 15: pp. 1182-1191, 1995-   Richardson J. H. et al. Phenotypic Knockout of the High-affinity    Interleukin 2 Receptor by Intracellular Single Chain Antibodies    against the Alpha Subunit of the Receptor. Proc. Nat. Acad. Sci.,    USA; vol. 92: pp. 3137-3141, 1995-   Maciejewski J. P. et al. Intracellular Expression of Antibody    Fragments Directed against Human Immunodeficiency Virus Reverse    Transcriptase Prevents HIV Infection in vitro. Nat. Med.; vol. 1:    pp. 667-673, 1995-   Marasco W. A. et al. Design, Intracellular Expression, and Activity    of a Human Anti-human Immunodeficiency Virus Type I gp120 Single    Chain Antibody. Proc. Nat. Acad. Sci., USA; vol. 90: pp. 7889-7893,    1993-   Levy Mintz P. et al. Intracellular Expression of Single Chain    Variable Fragment to Inhibit Early Stages of the Virla Life Cycle by    Targeting Human Immunodeficiency Virus Type I Integrase. J. Virol.;    vol. 70: pp. 8821-8832, 1996-   Duan L. et al. Intracellular Immunization Against Human    Immunodeficiency Virus Type I Infection of Human T Lymphocytes:    Utility of Anti-rev Single Chain Variable Fragment. Hum. Gene Ther.;    vol. 6(12): pp. 1561-1573, 1995-   Kim S. H. et al. Expression and Characterization of Recombinant    Single-chain Fv and Fv Fragments Derived from a Set of Catalytic    Antibodies. Mol. Immunol.; vol. 34(12-13): pp. 891-906, 1997-   Choi C. W. et al. Biodistribution of 18F- and 125I-labelled Anti-Tac    Disulfide-stabilized Fv Fragments in Nude Mice with Interleukin 2 a    Receptor-positive Tumor Xenografts. Cancer Research; vol. 55: pp.    5323-5329, 1995-   Colcher D. et al Pharmacokinetics and Biodistribution of    Genetically-engineered Antibodies. Q J Nucl Med.; vol. 42(4): pp.    225-41, 1998-   Pavlinkova G. et al. Pharmacokinetics and Biodistribution of    Engineered Single-chain Antibody Constructs of MAb CC49 in Colon    Carcinoma Xenografts. J. Nucl. Med.; vol. 40(9): pp. 1536-46, 1999-   Antibody Engineering Page, IMT, University of Marburg, FRG:    http://aximt1.imt.uni-marburg.de/_rek/indexfenster.html-   Hunkapiller M. et al. A Microchemical Facility for the Analysis and    Synthesis of Genes and Proteins. Nature; vol. 310(5973): pp. 105-11,    1984-   Xia X and Li W H. What Amino Acid Properties Affect Protein    Evolution, J. Mol. Evol.; vol. 47(5): pp. 557-64, 1998-   Sandberg M, et al. New Chemical Descriptors Relevant for the Design    of Biologically Active Peptides. A Multivariate Characterization of    87 Amino Acids. J. Med. Chem.; vol. 41(14): pp. 2481-91, 1998-   Hopp T. P. and Woods K. R. Prediction of Protein Antigenic    Determinants from Amino Acid Sequences. Proc. Natl. Acad. Sci.,    U.S.A.; vol. 78: pp. 3824, 1981-   Bradford, M. A Rapid and Sensitive Method for the Quantitation of    Microgram Quantities of Protein Utilizing the Principle of    Protein-dye Binding. Anal. Biochem.; vol. 72: pp. 248-54, 1976-   Lowry, O. J. Biol. Chem.; vol. 193, pp. 265, 1951-   Lei S. P. et al. Characterization of the Erwinia Carotovora pelB    Gene and its Product Pectate Lyase. J. Bacteril.; vol. 169: pp.    4379-83, 1987-   Chou P. Y. and Fasman G. D. Prediction of Protein Conformation.    Biochemistry; vol. 13(2): pp. 222-45, 1974-   Lang L. and Eckelmann W. C. One-step Synthesis of 18F labeled    [18F]-N-succinimidyl 4-(fluoromethyl) benzoate for Protein Labeling.    Appl. Radiat. Isot.; vol. 45: pp. 1155-63, 1994-   Sambrook et al.; Glover (ed.). DNA Cloning: A Practical Approach.    MRL Press, Ltd., Oxford, U.K.; vol. I, II, 1985-   Benton and Davis. Screening Lambdagt Recombinant Clones by    Hybridization to Single Plaques in situ. Science; vol. 196(4286):    pp. 180-2, 1977-   Clemmons D. R. IGF Binding Proteins and their Functions. Mol.    Reprod. Dev.; vol. 35: pp. 368-374, 1993-   Loddick S. A. et al. Displacement of Insulin-like Growth Factors    from their Binding Proteins as a Potential Treatment for Stroke.    Proc. Natl. Acad. Sci., U.S.A.; vol. 95: pp. 1894-1898, 1998-   Swift G. H. et al. Tissue-specific expression of the rat pancreatic    elastase I gene in transgenic mice. Cell; vol. 38:pp. 639-646, 1984-   Hanahan D. Heritable formation of pancreatic beta-cell tumours in    transgenic mice expressing recombinant insulin/simian virus 40    oncogenes. Nature; vol. 315: pp. 115-122, 1985-   Grosschedl R. et al. Introduction of a mu immunoglobulin gene into    the mouse gene line: specific expression in lymphoid cells and    synthesis of functional antibody. Cell; vol. 38: pp. 647-658, 1984-   Leder A et al. Consequences of widespread deregulation of the c-myc    gene in transgenic mice: multiple neoplasms and normal development.    Cell; vol. 45: pp. 485-495, 1986-   Pinkert C. A. et al. An albumin enhancer located 10 kb upstream    functions along with its promoter to direct efficient,    liver-specific expression in transgenic mice. Genes Dev.; vol. 1:    pp. 268-276, 1987-   Knimlauf R. et al. Developmental regulation of alpha-fetoprotein    genes in transgenic mice. Mol. Cell. Biol.; vol. 5: pp. 1639-1648,    1985-   Kelsey G. D. et al. Species- and tissue-specific expression of human    alpha 1-antitrypsin in transgenic mice. Genes Dev.; vol. 1: pp.    161-171, 1987-   Magram J. et al. Developmental regulation of a cloned adult    beta-globin gene in transgenic mice. Nature; vol. 315: pp. 338-340,    1985-   Readhead C. et al. Expression of a myelin basic protein gene in    transgenic shiverer mice: correction of the dysmyelinating    phenotype. Cell; vol. 48: pp. 703-712, 1987-   Shani M. Tissue-specific expression of rat myosin light-chain 2 gene    in transgenic mice. Nature; vol. 314: pp. 283-286, 1985-   Mason A. J. et al. The hypogonadal mouse: reproductive functions    restored by gene therapy. Science; vol. 234: pp. 1372-1378, 1986-   Smith D. B. and Johnson K. S. Single-step purification of    polypeptides expressed in Escherichia coli as fusions with    glutathione S-transferase. Gene; vol. 67: pp. 31-40, 1988-   Lei S. P. et al. Characterization of the Erwinia carotovora pelB    gene and its product pectate lyase. J. Bacteril., vol. 169: pp.    4379, 1987-   Kim S. H. et al. Expression and characterization of recombinant    single-chain Fv and Fv fragments derived from a set of catalytic    antibodies. Mol. Immunol, vol. 34: pp. 891-906, 1997-   Cale J. M. et al. Optimization of a reverse transcription-polymerase    chain reaction (RT-PCR) mass assay for low-abundance mRNA. Methods    Mol. Biol.; vol. 105: pp. 351-71, 1998-   Weis J. H. et al. Detection of rare mRNAs via quantitative RT-PCR.    Trends Genet.; vol. 8(8): pp. 263-4, 1992--   Frohman M. A. On beyond classic RACE (rapid amplification of cDNA    ends). PCR Methods Appl.; vol. 4(1): pp. S40-58, 1994-   Adams P. D. et al. Extending the limits of molecular replacement    through combined simulated annealing and maximum-likelihood    refinement. Acta Crystallogr. D. Biol. Crystallogr.; vol. 55 (Pt 1):    pp. 181-90, 1999-   Schwarze S. R. et al. In Vivo Protein Transduction: Delivery of a    Biologically Active Protein into the Mouse. Science; vol. 285: pp.    1565-72, 1999-   Hoffman R. M. Topical liposome targeting of dyes, melanins, genes,    and proteins selectively to hair follicles. J. Drug Target.; vol.    5(2): pp. 67-74, 1998-   Pluckthun A. et al. Catalytic antibodies: contributions from    engineering and expression in Escherichia coli. Ciba Found. Symp.;    vol. 159: pp. 103-12; discussion 112-7, 1991-   Guogiang J. et al. Dimerization Inhibits the Activity of    Receptor-like Protein-tyrosine Phosphatase alpha. Nature; vol. 401:    pp. 606-610, 1999-   BIC, Explorer, Business Opportunities in Technology    Commercialization.-   Illanes A. Stability of biocatalysts. Elec. J. Biotech., vol. 2(1):    pp. 7-15, 1999-   DeSantis G. and Jones J. B. Chemical modification of enzymes for    enhanced functionality. Curr. Op. Biotech., vol. 10(4): pp. 324-340,    1999-   Govardhan C. P. Crosslinking of enzymes for improved stability and    performance. Curr Opin Biotechnol. Aug; vol 10(4):331-5, 1999-   Beguin P. Hybrid enzymes. Curr. Op. Biotech., vol. 10(4): pp.    336-340, 1999-   Haring D. and Schreier P. Cross-linked enzyme crystals. Curr Opin    Chem Biol.; vol. 3(1): pp. 35-8, 1999-   Moreno-Hagelsieb G. and Soberon X. Protein engineering as a powerful    tool for the chemical modification of enzymes. Biol Res.; vol.    29(1): pp. 127-40, 1996-   Jaeger K-E. et al. Bacterial Biocatalysts: Molecular Biology,    Three-Dimensional Structures, and Biotechnological Applications of    Lipases. Annu. Rev. Microbiol. vol. 53: pp. 315-51, 1999-   Carrea G. and Riva S. Properties and Synthetic Applications of    Enzymes in Organic Solvents. Angew Chem Int Ed Eng1. Vol. 39(13):    pp. 2226-2254, 2000-   Stemmer W. P. C. Rapid Evolution of a Protein in Vitro by DNA    Shuffling. Nature. Vol. 370: pp. 389-391, 1994-   Zhao H. and Arnold F. H. Optimization of DNA Shuffling for High    Fidelity Recombination. Nucleic Acids Res. Vol. 25: pp. 1307-1308,    1997-   Zhao H. et al. Molecular Evolution by Staggered Extension Process    (StEP) in Vitro Recombination. Nat. Biotechnol. Vol 16: pp. 258-261,    1998-   Shao Z. et al. Random-priming in vtro Recombination: an Effective    Tool for Directed Evolution. Nucleic Acids Res. Vol. 26: pp.    681-683, 1998-   Vo-Dinh T. and Cullum B. Biosensors and Biochips: Advances in    Biological and Medical Diagnostics. Fresenius J Anal Chem. Vol. 366:    pp. 540-551, 2000-   Patkar et al. Effect of Mutations in Candida Antarctica B Lipase.    Chem.& Phys. Of Lipids. Vol. 93, pp. 95-101, 1998-   Rotticci-Mulder et al. Expression in Pichia Pastoris of Candida    Antarctica Lipase B and Lipase B Fused to a Cellulose Binding    Domain. Prot. Expr. & Purif. Vol. 21, pp. 386-392, 2001-   Winkler & Stuckmann. Glycogen, Hyaluronate, and Some Other    Polysaccharides Greatly Enhance the Formation of Exolipase by    Serratia marcescens. J. Bacteriol. Vol. 138, pp. 663-670, 1979-   Liebeton et al. Directed Evolution of an Enantioselective Lipase.    Chem. & Biol. 2000. Vol. 7 (9), pp. 709-718-   Schmidt-Dannert. Recombinant Microbial Lipases for Biotechnological    Applications. Bioorg. & Med. Chem. Vol. 7, pp. 2123-2130, 1999-   Takagi et al. Enhancement of the Thermostability of Subtilisin E by    Introduction of a Disulfide Bond Engineered on the Basis of    Structural Comparison with a Thermophilic Serine Protease. JBC. Vol.    265(12); pages 6874-78, 1990-   Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease    by an Engineered Disulfide Bond. JBC. Vol. 272(17); pages 11152-56,    1997-   Takagi et al. Engineering Subtilisin E for Enhanced Stability and    Activity in Polar Organic Solvents. J. Biochem. Vol. 127; pages    617-25, 2000-   Mitchinson and Wells. Protein Engineering of Disulfide Bonds in    Subtilisin BPN′. Biochemistry. Vol. 28(11); pages 4807-15, 1989-   Zhao and Arnold. Directed Evolution Converts Subtilisin E into a    Functional Equivalent of Thermitase. Protein Eng. Vol.12(1): pages    47-53, 1999

The invention claimed and described herein is not to be limited in scopeby the specific embodiments, including but not limited to the depositedmicroorganism embodiments, herein disclosed since these embodiments areintended as illustrations of several aspects of the invention. Indeed,various modifications of the invention in addition to those shown anddescribed herein will become apparent to those skilled in the art fromthe foregoing description. Such modifications are also intended to fallwithin the scope of the appended claims.

A number of references are cited herein, the entire disclosures of whichare incorporated herein, in their entirety, by reference.

1. A method for making a stabilized protein or fragment thereofcomprising: (a) selecting one or more residue pairs in a polypeptidechain or chains for cross-linking using one or more statisticalcriteria; and (b) cross-linking the residue pairs.
 2. The method ofclaim 1, wherein the stabilized protein or fragment is selected from thegroup consisting of a hormone, a receptor, a growth factor, an enzymeand an antibody.
 3. The method of claim 2, wherein the enzyme is alipase or the antibody fragment is an Fv fragment.
 4. The method ofclaim 1, wherein the one or more statistical criteria used for selectionof residue pairs in step (a) are selected from the group consisting ofstatistical filter one through statistical filter six.
 5. The method ofclaim 1, wherein tyrosine residues are cross-linked.
 6. The method ofclaim 6, wherein cross-linking is catalyzed by a catalyst selected fromthe group consisting of polyhistidine, Gly-Gly-His and metalloporphyrin.7. The method of claim 6, wherein the cross-linked tyrosine residues areintroduced into the stabilized protein complex prior to cross-linking byrecombinant nucleic acid methods.
 8. A method for identifying a residuepair in a polypeptide chain or chains that, following substitution withtyrosine and cross-linking, is least likely to be disruptive of overallprotein structure, comprising applying one or more statistical criteriaselected from the group consisting of statistical filter one throughstatistical filter six.
 9. A protein cross-linked by the method ofclaim
 1. 10-20. (canceled)